ecs251 Spring 2007: Operating System Models #1: File Systems

Transcript ecs251 Spring 2007: Operating System Models #1: File Systems

UCDavis, ecs251
Fall 2007
ecs251 Fall 2007:
Operating System Models
#1: File Systems
Dr. S. Felix Wu
Computer Science Department
University of California, Davis
http://www.cs.ucdavis.edu/~wu/
[email protected]
09/27/2007
File Systems
1
UCDavis, ecs251
Fall 2007
File Systems
Information organization, access, and
storage
 Reliability
 Content-centric computing & networking

– Database, Web/Google, Bittorrent, emails
09/27/2007
File Systems
2
UCDavis, ecs251
Fall 2007
09/27/2007
File Systems
3
UCDavis, ecs251
Fall 2007
System-call interface
Active file entries
VNODE Layer or VFS
Local naming (UFS)
FFS
Buffer cache
Block or character device driver
Hardware
09/27/2007
File Systems
4
UCDavis, ecs251
Fall 2007
File  Disk
separate the disk into blocks
 separate the file into blocks as well
 “paging” from file to disk

blocks: 4 - 7- 2- 10- 12
How to represent the file??
How to link these 5 pages together??
09/27/2007
File Systems
5
UCDavis, ecs251
Fall 2007
Bit Torrent pieces
1 big file (X Gigabytes) with a number of
pieces (5%) already in (and sharing with
others).
 How much disk space do we need at this
moment?

09/27/2007
File Systems
6
UCDavis, ecs251
Fall 2007
File  Disk blocks
file
block
0
file
block
1
file
block
2
file
block
3
0
file
block
4
4
7
2
10
12
What are the disadvantages?
1. disk access can be slow for “random access”.
2. How big is each block? 2^X bytes? 2^X+8 bytes?
09/27/2007
File Systems
7
UCDavis, ecs251
Fall 2007
One Logical File  Physical Disk Blocks
efficient representation & access
09/27/2007
File Systems
8
UCDavis, ecs251
Fall 2007
A file
09/27/2007
An i-node
??? entries in
one disk block
File Systems
9
UCDavis, ecs251
Fall 2007
09/27/2007
File Systems
10
UCDavis, ecs251
125
struct
Fall 2007
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
ufs2_dinode {
u_int16_t di_mode; /* 0: IFMT, permissions; see below. */
int16_t di_nlink; /* 2: File link count. */
u_int32_t di_uid; /* 4: File owner. */
u_int32_t di_gid; /* 8: File group. */
u_int32_t di_blksize; /* 12: Inode blocksize. */
u_int64_t di_size; /* 16: File byte count. */
u_int64_t di_blocks; /* 24: Bytes actually held. */
ufs_time_t di_atime; /* 32: Last access time. */
ufs_time_t di_mtime; /* 40: Last modified time. */
ufs_time_t di_ctime; /* 48: Last inode change time. */
ufs_time_t di_birthtime; /* 56: Inode creation time. */
int32_t di_mtimensec; /* 64: Last modified time. */
int32_t di_atimensec; /* 68: Last access time. */
int32_t di_ctimensec; /* 72: Last inode change time. */
int32_t di_birthnsec; /* 76: Inode creation time. */
int32_t di_gen; /* 80: Generation number. */
u_int32_t di_kernflags; /* 84: Kernel flags. */
u_int32_t di_flags; /* 88: Status flags (chflags). */
int32_t di_extsize; /* 92: External attributes block. */
ufs2_daddr_t di_extb[NXADDR];/* 96: External attributes block. */
ufs2_daddr_t di_db[NDADDR]; /* 112: Direct disk blocks. */
ufs2_daddr_t di_ib[NIADDR]; /* 208: Indirect disk blocks. */
int64_t di_spare[3]; /* 232: Reserved; currently unused */
};
09/27/2007
File Systems
11
UCDavis, ecs251
Fall struct
2007
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
ufs1_dinode {
u_int16_t di_mode; /* 0: IFMT, permissions; see below. */
int16_t di_nlink; /* 2: File link count. */
union {
u_int16_t oldids[2]; /* 4: Ffs: old user and group ids. */
} di_u;
u_int64_t di_size; /* 8: File byte count. */
int32_t di_atime; /* 16: Last access time. */
int32_t di_atimensec; /* 20: Last access time. */
int32_t di_mtime; /* 24: Last modified time. */
int32_t di_mtimensec; /* 28: Last modified time. */
int32_t di_ctime; /* 32: Last inode change time. */
int32_t di_ctimensec; /* 36: Last inode change time. */
ufs1_daddr_t di_db[NDADDR]; /* 40: Direct disk blocks. */
ufs1_daddr_t di_ib[NIADDR]; /* 88: Indirect disk blocks. */
u_int32_t di_flags; /* 100: Status flags (chflags). */
int32_t di_blocks; /* 104: Blocks actually held. */
int32_t di_gen; /* 108: Generation number. */
u_int32_t di_uid; /* 112: File owner. */
u_int32_t di_gid; /* 116: File group. */
int32_t di_spare[2]; /* 120: Reserved; currently unused */
};
09/27/2007
File Systems
12
UCDavis, ecs251
Fall 2007
Bittorrent pieces
File size: 10 GB
Pieces downloaded: 512 MB
How much disk space do we need?
09/27/2007
File Systems
13
UCDavis, ecs251
Fall 2007
Welcome to
ecs251
09/27/2007
A lages an, Shri V idhya
A nand, M anis h K.
A pple, J ames B.
Bhattac haryya, P rantik
Bras s eur, Brian A .
C ai, L iang
C han, A n
C han, Kelc ey
D ing, D an
G ulati, Supriya S.
H adke, A mit A .
H arve, Shruthi K .
J ones , C had E .
Kim, Y ejin
Kis hore, A poorv
Kwon, T aeho
L orc a-M artinez, D aniel A .
M as on, Blake
M uppala, Y ugandhar
N els on, J arom R.
Sc hwarz, C hris topher G .
Shafii, Sohail S.
Shi, L ei
Singh, A nhad P .
Stegers , T ill J .
Sukharev, J effrey
T homas on, Rus s ell D .
Xia, M ing
File Systems
Xu, D an
14
UCDavis, ecs251
Fall 2007
Subjects to cover







File Systems
Distributed File Systems (NFS/AFS/Coda)
Peer-to-Peer Systems (Bittorrent, Chord, Web 2.0)
Social Network Systems (Facebook, LinkedIn,
Orkut, MySpace,…)
Transaction, Recovery & Distributed Commit
Trust and Reputation Systems
Self-Stabilization Systems (if time available)
09/27/2007
File Systems
15
UCDavis, ecs251
Fall 2007
Class website

http://www.cs.ucdavis.edu/~wu/ecs251
09/27/2007
File Systems
16
UCDavis, ecs251
Fall 2007
About the Instructor

S. Felix Wu
– [email protected]
– Facebook group (under UC Davis
network): “ecs251”



Office: 3057 Engineering II
Phone: 530-754-7070
Office Hours:
– 10-11 a.m. on Monday and Friday
– by appointment
09/27/2007
File Systems
17
UCDavis, ecs251
Fall 2007
Why 3 email addresses?
– [email protected]
read/response during the quarters,
especially before the homework
deadlines.
– [email protected]
– My main email contact for everything
all the time.
– [email protected]
– Read only once in the past three
months…
09/27/2007
File Systems
18
UCDavis, ecs251
Fall 2007
09/27/2007
File Systems
19
UCDavis, ecs251
Fall 2007
09/27/2007
File Systems
20
UCDavis, ecs251
Fall 2007
Anti-Spam
 [email protected]
  subject: [ecs251_f2007]…


ecs251_f2007_0927 is the cyber
social link between the instructor and the
students in ecs251, Fall 2007.
09/27/2007
File Systems
21
UCDavis, ecs251
Fall 2007
Textbook

A set of papers on these subjects
09/27/2007
File Systems
22
UCDavis, ecs251
Fall 2007
Requirements
Midterm: 30% (11/01/2007)
 Final Project: 70% (1~2 persons)

– Pre-proposal (10%, 10/18/2007)
– Proposal/Design/Prototype (30%, 11/20/2007)
– Final Demo & Report (30%, 12/11/2007)
09/27/2007
File Systems
23
UCDavis, ecs251
Fall 2007
Projects (10/02/2007)

On-demand/interactive Bittorrent
– Kernel, BT client, & applications


Transaction-oriented File System
P2P-X over Internet
– X = Facebook, Google, eBay, ud-DFS, Data Center

“Internet-less information system”
– MANET-based systems

Social network information crawling
09/27/2007
File Systems
24
UCDavis, ecs251
Fall 2007
On-Demand/Interactive BT



BT piece selection -- rarest-first plus priority
Downloading xyz.rm and we want to fast forward
to a particular part of the file
Can “the kernel file system” pass that particular
need to the BT?
– E.g., the system call read will be blocked until the BT
obtains the needed pieces.


Advantage: don’t need to modify the applications!
Disadvantages: ???
09/27/2007
File Systems
25
UCDavis, ecs251
Fall 2007
Transaction-oriented FS

Meta File system consistency:
– FreeBSD: Soft Updates
– Linux: Journaling

Can we relax “All or Nothing”?
– A better user interface for us to recover partial
file modifications.
09/27/2007
File Systems
26
UCDavis, ecs251
Fall 2007
P2P over X
Facebook
 Google
 eBay
 MegaUpload


Problems with centralized service
providers???
09/27/2007
File Systems
27
UCDavis, ecs251
Fall 2007
DaceBook
Davis version of FaceBook
 Peer-to-Peer social networking
 Community gateways to other social
networks such as Facebook


Socially here, information here
09/27/2007
File Systems
28
UCDavis, ecs251
Fall 2007
Internet-less Information
Sharing
Ad Hoc Collaboration for Content Sharing in the
Infrastructure-less Heterogeneous Network
D (WiFi+WiMAX)
C (WiMAX)
E (WiFi)
S (WiFi)
B (WiFi + Bluetooth)
A (Bluetooth)
F (WiFi + WiMAX)
09/27/2007
File Systems
29
UCDavis, ecs251
Fall 2007
Infrastructure-Less
“No IP address, DNS, or Web servers”
 Purely P2P from Layers 3 to 7

09/27/2007
File Systems
30
UCDavis, ecs251
Fall 2007
Social Network Crawling
09/27/2007
File Systems
31
UCDavis, ecs251
Fall 2007
09/27/2007
File Systems
32
UCDavis, ecs251
Fall 2007
Social Networking
 Social
Network services
– Friendster, MySpace, Facebook, Orkut,
LinkIn..
 Social
Links, Interest Keywords,
Search & Community Services
09/27/2007
File Systems
33
UCDavis, ecs251
Fall 2007
Social Networks: # of Users
http://en.wikipedia.org/wiki/List_of_social_networking_sites








FaceBook
Friendster
Hi5
LinkIn
MySpace
Orkut
Yahoo! 360o
… among many others.
09/27/2007
File Systems
~34M
~47M
~50M
~12M
~200M
~47M
~4M
34
UCDavis, ecs251
Fall 2007
09/27/2007
File Systems
35
UCDavis, ecs251
Fall 2007
Social Network Access
Interfaces
FBML - Facebook Markup Language
 FQL - Facebook Query Language
 REST (REpresentational State Transfer)
Web API
 FBJS - Implementation of Javascript on the
Facebook platform

09/27/2007
File Systems
36
UCDavis, ecs251
Fall 2007
#include <stdio.h>
#include <stdlib.h>
# ./t
int
main
# ls –l ./sss.txt
(void)
{
FILE *f1 = fopen("./sss.txt", "w");
int i;
for (i = 0; i < 1000; i++)
{
fseek(f1, rand(), SEEK_SET);
fprintf(f1, "%d%d%d%d", rand(), rand(),
rand(), rand());
if (i % 100 == 0) sleep(1);
}
fflush(f1);
}
09/27/2007
File Systems
37
UCDavis, ecs251
Fall 2007
09/27/2007
File Systems
38
UCDavis, ecs251
Fall 2007
09/27/2007
File Systems
39
UCDavis, ecs251
Fall 2007
09/27/2007
File Systems
40
UCDavis, ecs251
Fall 2007
di_size vs. di_blocks
Logical
 Physical

fstat
 du

09/27/2007
File Systems
41
UCDavis, ecs251
Fall 2007
One Logical File  Physical Disk Blocks
efficient representation & access
09/27/2007
File Systems
42
UCDavis, ecs251
Fall 2007
A file
An i-node
??? entries in
one disk block
Typical:
each block 1K
09/27/2007
File Systems
43
UCDavis, ecs251
Fall 2007
A file
An i-node
??? entries in
one disk block
Typical:
each block 1K
09/27/2007
File Systems
44
UCDavis, ecs251
Fall 2007




i-node
How many disk blocks can a FS have?
How many levels of i-node indirection will be
necessary to store a file of 2G bytes? (I.e., 0, 1, 2
or 3)
What is the largest possible file size in i-node?
What is the size of the i-node itself for a file of
10GB with only 512 MB downloaded?
09/27/2007
File Systems
45
UCDavis, ecs251
Fall 2007

Answer
How many disk blocks can a FS have?
– 264 or 232: Pointer (to blocks) size is 8/4 bytes.

How many levels of i-node indirection will be
necessary to store a file of 2G (231) bytes? (I.e., 0,
1, 2 or 3)
– 12*210 + 28 * 210 + 28 *28 *2 10 + 28 * 28 *28 *2 10 >? 231

What is the largest possible file size in i-node?
– 12*210 + 28 * 210 + 28 *28 *2 10 + 28 * 28 *28 *2 10
– 264 –1
– 232 * 210
You need to consider three issues and find the minimum!
09/27/2007
File Systems
46
UCDavis, ecs251
Fall 2007
Answer: Lower Bound

How many pointers?
– 512MB divided by the block size (1K)
– 512K pointers times 8 (4) bytes = 4 (2) MB
09/27/2007
File Systems
47
UCDavis, ecs251
Fall 2007
Answer: Upper Bound
In the worst case, EVERY indirection block
has at least one entry!
 How many indirection blocks?

– Single: 1 block
– Double: 1 + 28
– Tripple: 1 + 28 + 216

Total ~ 216 blocks times 1K = 64 MB
– 214 times 1K = 16MB (ufs2 inode)
09/27/2007
File Systems
48
UCDavis, ecs251
Fall 2007
Answer (4)

2 MB ~ 64 MB
 4 MB ~ 16 MB
ufs1
ufs2

Answer: sss.txt ~17 MB
– ~16 MB (inode indirection blocks)
– 1000 writes times 1K ~ 1MB
09/27/2007
File Systems
49
UCDavis, ecs251
Fall 2007
A File System
partition
b s
i-node
09/27/2007
partition
i-list
i-node
d
…….
partition
directory and data blocks
i-node
File Systems
50
UCDavis, ecs251
Fall 2007
dirp = opendir(const char *filename);
struct dirent *direntp = readdir(dirp);
struct dirent {
ino_t d_ino;
char d_name[NAME_MAX+1];
};
directory
dirent
dirent
dirent
inode
file_name
inode
file_name
inode
file_name
file
file
file
09/27/2007
File Systems
51
UCDavis, ecs251
Fall 2007
09/27/2007
File Systems
52
UCDavis, ecs251
Fall 2007
2
.
..
root
wheel
drwxr-xr-x
Apr 1 2004
usr
vmunix
3
4
5
6
7
root
wheel
drwxr-xr-x
Apr 1 2004
root
wheel
rwxr-xr-x
Apr 15 2004
kirk
staff
rw-rw-r-Jan 19 2004
root
wheel
drwxr-xr-x
Apr 1 2004
.
..
bin
foo
text
9
09/27/2007
directory
/
4
2
7
6
directory
/usr
data
Hello World!
.
..
ex
groff
vi
8
bin
bin
rwxr-xr-x
Apr 15 2004
2
2
4
5
File Systems
text
7
4
9
10
9
data
file
/vmunix
file
/usr/foo
directory
/usr/bin
file
/usr/bin/vi
53
UCDavis, ecs251
Fall 2007
struct dirent {
ino_t d_ino;
char d_name[NAME_MAX+1];
};
struct stat {…
short nlinks;
…};
directory
dirent
dirent
dirent
inode
file_name
inode
file_name
inode
file_name
file
file
file
09/27/2007
File Systems
54
UCDavis, ecs251
Fall 2007
A File System
partition
b s
i-node
09/27/2007
partition
i-list
i-node
d
…….
partition
directory and data blocks
i-node
File Systems
55
UCDavis, ecs251
Fall 2007
What is the difference?
ln –s /usr/src/sys/sys/proc.h ppp.h
 ln /usr/src/sys/sys/proc.h ppp.h

09/27/2007
File Systems
56
UCDavis, ecs251
Fall 2007
File System Buffer Cache
application:
OS:
read/write files
translate file to disk blocks
maintains
...buffer cache ...
controls disk accesses: read/write blocks
hardware:
Any problems?
09/27/2007
File Systems
57
UCDavis, ecs251
Fall 2007
File System Consistency
To maintain file system consistency the
ordering of updates from buffer cache to
disk is critical
 Example:

– if the directory block is written back before the
i-node and the system crashes, the directory
structure will be inconsistent
09/27/2007
File Systems
58
UCDavis, ecs251
Fall 2007
File System Consistency





File system almost always use a buffer/disk cache for
performance reasons
This problem is critical especially for the blocks that
contain control information: i-node, free-list, directory
blocks
Two copies of a disk block (buffer cache, disk) 
consistency problem if the system crashes before all the
modified blocks are written back to disk
Write back critical blocks from the buffer cache to disk
immediately
Data blocks are also written back periodically: sync
09/27/2007
File Systems
59
UCDavis, ecs251
Fall 2007
Two Strategies

Prevention
– Use un-buffered I/O when writing i-nodes or pointer
blocks
– Use buffered I/O for other writes and force sync every
30 seconds

Detect and Fix
– Detect the inconsistency
– Fix them according to the “rules”
– Fsck (File System Checker)
09/27/2007
File Systems
60
UCDavis, ecs251
Fall 2007
File System Integrity

Block consistency:
– Block-in-use table
– Free-list table

0 1 1 1 0 0 0 1 0 0 0 2
1 0 0 0 1 1 1 0 1 0 2 0
File consistency:
– how many directories pointing to that i-node?
– nlink?
– three cases: D == L, L > D, D > L

09/27/2007
What to do with the latter two cases?
File Systems
61
UCDavis, ecs251
Fall 2007
File System Integrity

File system states
(a) consistent
(b) missing block
(c) duplicate block in free list
(d) duplicate data block
09/27/2007
File Systems
62
UCDavis, ecs251
Fall 2007
Metadata Operations

Metadata operations modify the structure
of the file system
– Creating, deleting, or renaming
files, directories, or special files
– Directory & I-node

Data must be written to disk in such a way
that the file system can be recovered to a
consistent state after a system crash
09/27/2007
File Systems
63
UCDavis, ecs251
Fall 2007
Metadata Integrity

FFS uses synchronous writes to guarantee
the integrity of metadata
– Any operation modifying multiple pieces of
metadata will write its data to disk in a specific
order
– These writes will be blocking

Guarantees integrity and durability of
metadata updates
09/27/2007
File Systems
64
UCDavis, ecs251
Fall 2007
Deleting a file (I)
i-node-1
abc
def
i-node-2
ghi
i-node-3
Assume we want to delete file “def”
09/27/2007
File Systems
65
UCDavis, ecs251
Fall 2007
Deleting a file (II)
i-node-1
abc
?
def
ghi
i-node-3
Cannot delete i-node before directory entry “def”
09/27/2007
File Systems
66
UCDavis, ecs251
Fall 2007
Deleting a file (III)

Correct sequence is
1.
2.

Write to disk directory block containing deleted
directory entry “def”
Write to disk i-node block containing deleted inode
Leaves the file system in a consistent state
09/27/2007
File Systems
67
UCDavis, ecs251
Fall 2007
Creating a file (I)
i-node-1
abc
ghi
i-node-3
Assume we want to create new file “tuv”
09/27/2007
File Systems
68
UCDavis, ecs251
Fall 2007
Creating a file (II)
i-node-1
abc
ghi
i-node-3
tuv
?
Cannot write directory entry “tuv” before i-node
09/27/2007
File Systems
69
UCDavis, ecs251
Fall 2007
Creating a file (III)

Correct sequence is
1.
2.

Write to disk i-node block containing new i-node
Write to disk directory block containing new
directory entry
Leaves the file system in a consistent state
09/27/2007
File Systems
70
UCDavis, ecs251
Fall 2007
Synchronous Updates

Used by FFS to guarantee consistency of
metadata:
– All metadata updates are done through blocking
writes
Increases the cost of metadata updates
 Can significantly impact the performance
of whole file system

09/27/2007
File Systems
71
UCDavis, ecs251
Fall 2007
SOFT UPDATES
Use delayed writes (write back)
 Maintain dependency information about
cached pieces of metadata:

This i-node must be updated before/after this
directory entry

Guarantee that metadata blocks are written
to disk in the required order
09/27/2007
File Systems
72
UCDavis, ecs251
Fall 2007
3 Soft Update Rules
Never point to a structure before it has
been initialized.
 Never reuse a resource before nullifying
all previous pointers to it.
 Never reset the old pointer to a live
resource before the new pointer has been
set.

09/27/2007
File Systems
73
UCDavis, ecs251
Fall 2007
Problem #1 with S.U.
Synchronous writes guaranteed that
metadata operations were durable once the
system call returned
 Soft Updates guarantee that file system will
recover into a consistent state but not
necessarily the most recent one

– Some updates could be lost
09/27/2007
File Systems
74
UCDavis, ecs251
Fall 2007
What are the dependency relationship?
We want to delete file “foo”
and create new file “bar”
Block A
Block B
foo
i-node-2
NEW bar
NEW i-node-3
09/27/2007
File Systems
75
UCDavis, ecs251
Fall 2007
Circular Dependency
X-2nd
Y-1st
We want to delete file “foo”
and create new file “bar”
Block A
Block B
foo
i-node-2
NEW bar
NEW i-node-3
09/27/2007
File Systems
76
UCDavis, ecs251
Fall 2007
Problem #2 with S.U.

Cyclical dependencies:
– Same directory block contains entries to be
created and entries to be deleted
– These entries point to i-nodes in the same block

Brainstorming:
– How to resolve this issue in S.U.?
09/27/2007
File Systems
77
UCDavis, ecs251
Fall 2007
How to update?? i-node first or director block first?
09/27/2007
File Systems
78
UCDavis, ecs251
Fall 2007
09/27/2007
File Systems
79
UCDavis, ecs251
Fall 2007
Solution in S.U.

Roll back metadata in one of the blocks to
an earlier, safe state
Block A’
def
(Safe state does not contain new directory
entry)
09/27/2007
File Systems
80
UCDavis, ecs251
Fall 2007
Write first block with metadata that were
rolled back (block A’ of example)
 Write blocks that can be written after first
block has been written (block B of
example)
 Roll forward block that was rolled back
 Write that block
 Breaks the cyclical dependency but must
now write twice block A

09/27/2007
File Systems
81
UCDavis, ecs251
Fall 2007
Before any Write Operation
SU Dependency Checking
(roll back if necessary)
After any Write Operation
SU Dependency Processing
(task list updating)
(roll forward if necessary)
09/27/2007
File Systems
82
UCDavis, ecs251
Fall 2007

two most popular approaches for improving
the performance of metadata operations and
recovery:
– Journaling
– Soft Updates
Journaling systems record metadata
operations on an auxiliary log
 Soft Updates uses ordered writes

09/27/2007
File Systems
83
UCDavis, ecs251
Fall 2007
JOURNALING
Journaling systems maintain an auxiliary
log that records all meta-data operations
 Write-ahead logging ensures that the log is
written to disk before any blocks containing
data modified by the corresponding
operations.

– After a crash, can replay the log to bring the file
system to a consistent state
09/27/2007
File Systems
84
UCDavis, ecs251
Fall 2007



Journaling
Atomically updated
Old and new versions of data held on disk until the
update commits
Undo logging:
– Copy old data to the log
– Write new data to disk
– If you crash during update, copy old data from log

Redo logging:
– Write new data to the log
– Old data remains on disk
– If you crash, copy new data from the log
09/27/2007
File Systems
85
UCDavis, ecs251
Fall 2007
How does it work?

Each disk update is a Transaction (atomic
update)
– Write new data to the disk (journal)
– The update is not final until a commit

Only after the commit block is written is the
update final
– The commit block is a single block of data on
the disk
– Not necessarily flushed to disk yet!
09/27/2007
File Systems
86
UCDavis, ecs251
Fall 2007
JOURNALING
Log writes are performed in addition to the
regular writes
 Journaling systems incur log write overhead
but

– Log writes can be performed efficiently
because they are sequential (block operation
consideration)
– Metadata blocks do not need to be written
back after each update
09/27/2007
File Systems
87
UCDavis, ecs251
Fall 2007
JOURNALING

Journaling systems can provide
– same durability semantics as FFS if log is
forced to disk after each meta-data operation
– the laxer semantics of Soft Updates if log
writes are buffered until entire buffers are
full
09/27/2007
File Systems
88
UCDavis, ecs251
Fall 2007
Versus log-structured file
system
A log structured file system ONLY contains
a log, everything is written to the end of this
log
 LSFS dictates how the data is stored on disk
 JFS does not dictate how the data is stored
on disk

09/27/2007
File Systems
89
UCDavis, ecs251
Fall 2007
Soft Updates vs. Journaling
Advantages
 disadvantages

09/27/2007
File Systems
90
UCDavis, ecs251
Fall 2007
With Soft Updates??
Do we still need “FSCK”? at boot time?
CPU
09/27/2007
File Systems
91
UCDavis, ecs251
Fall 2007
Recover the Missing Resources

In the background, in an active FS…
– We don’t want to wait for the lengthy FSCK
process to complete…

A related issue:
– the virus scanning process
– what happens if we get a new virus signature?
09/27/2007
File Systems
92
UCDavis, ecs251
Fall 2007
Snapshot of the FS
backup and restore
 dump reliably an active File System

– what will we do today to dump our 40GB FS
“consistent” snapshots? (in the midnight…)

“background FSCK checks”
09/27/2007
File Systems
93
UCDavis, ecs251
Fall 2007
What is a snapshot?
(I mean “conceptually”.)
Freeze all activities related to the FS.
 Copy everything to “some space”.
 Resume the activities.

How do we efficiently implement
this concept such that the
activities will only be blocked for
about 0.25 seconds, and we don’t
have to buy a really big hard
drive?
09/27/2007
File Systems
94
UCDavis, ecs251
Fall 2007
09/27/2007
File Systems
95
UCDavis, ecs251
Fall 2007
09/27/2007
File Systems
96
UCDavis, ecs251
Fall 2007
Copy-on-Write
09/27/2007
File Systems
97
UCDavis, ecs251
Fall 2007
Snapshot: a file
Logical size
Versus physical size
09/27/2007
File Systems
98
UCDavis, ecs251
Fall 2007
Example
#
#
#
#
mkdir /backups/usr/noon
mount –u –o snapshot /usr/snap.noon /usr
mdconfig –a –t vnode –u 0 –f /usr/snap.noon
mount –r /dev/md0 /backups/usr/noon
/* do whatever you want to test it */
# umount /backups/usr/noon
# mdconfig –d –u 0
# rm –f /usr/snap.noon
09/27/2007
File Systems
99
UCDavis, ecs251
Fall 2007
09/27/2007
File Systems
100
UCDavis, ecs251
Fall 2007
09/27/2007
File Systems
101
UCDavis, ecs251
Fall 2007
#include <stdio.h>
#include <stdlib.h>
int
main
(void)
{
FILE *f1 = fopen("./sss.txt", "w");
int i;
for (i = 0; i < 1000; i++)
{
fseek(f1, rand(), SEEK_SET);
fprintf(f1, "%d%d%d%d", rand(), rand(),
rand(), rand());
if (i % 100 == 0) sleep(1);
}
fflush(f1);
}
09/27/2007
File Systems
102
UCDavis, ecs251
Fall 2007
Example
#
#
#
#
mkdir /backups/usr/noon
mount –u –o snapshot /usr/snap.noon /usr
mdconfig –a –t vnode –u 0 –f /usr/snap.noon
mount –r /dev/md0 /backups/usr/noon
/* do whatever you want to test it */
# umount /backups/usr/noon
# mdconfig –d –u 0
# rm –f /usr/snap.noon
09/27/2007
File Systems
103
UCDavis, ecs251
Fall 2007
09/27/2007
File Systems
104
UCDavis, ecs251
Fall 2007
09/27/2007
File Systems
105
UCDavis, ecs251
Fall 2007
09/27/2007
File Systems
106
UCDavis, ecs251
Fall 2007
09/27/2007
File Systems
107
UCDavis, ecs251
Fall 2007
09/27/2007
File Systems
108
UCDavis, ecs251
Fall 2007
09/27/2007
File Systems
109
UCDavis, ecs251
Fall 2007
09/27/2007
File Systems
110
UCDavis, ecs251
Fall 2007
Example
#
#
#
#
mkdir /backups/usr/noon
mount –u –o snapshot /usr/snap.noon /usr
mdconfig –a –t vnode –u 0 –f /usr/snap.noon
mount –r /dev/md0 /backups/usr/noon
/* do whatever you want to test it */
# umount /backups/usr/noon
# mdconfig –d –u 0
# rm –f /usr/snap.noon
09/27/2007
File Systems
111
UCDavis, ecs251
Fall 2007
Copy-on-Write
09/27/2007
File Systems
112
UCDavis, ecs251
Fall 2007
09/27/2007
File Systems
113
UCDavis, ecs251
Fall 2007
A file
09/27/2007
A File System
??? entries in
one disk block
File Systems
114
UCDavis, ecs251
Fall 2007
A file
A Snapshot i-node
??? entries in
one disk block
Not used or
Not yet copy
09/27/2007
File Systems
115
UCDavis, ecs251
Fall 2007
A file
Copy-on-write
??? entries in
one disk block
Not used or
Not yet copy
09/27/2007
File Systems
116
UCDavis, ecs251
Fall 2007
A file
Copy-on-write
??? entries in
one disk block
Not used or
Not yet copy
09/27/2007
File Systems
117
UCDavis, ecs251
Fall 2007
Multiple Snapshots
about 20 snapshots
 Interactions/sharing among snapshots

09/27/2007
File Systems
118
UCDavis, ecs251
Fall 2007
Snapshot of the FS
backup and restore
 dump reliably an active File System

– what will we do today to dump our 40GB FS
“consistent” snapshots? (in the midnight…)

“background FSCK checks”
09/27/2007
File Systems
119
UCDavis, ecs251
Fall 2007
Transaction-based FS
Performance versus consistency
 “Atomic Writes” on Multiple Blocks

– See the paper titled “Atomic Writes for Data
Integrity and Consistency in Shared Storage
Devices for Clusters” by Okun and Barak,
FGCS, vol. 20, pages 539-547, 2004.
– Modify SCSI handling
09/27/2007
File Systems
120
UCDavis, ecs251
Fall 2007
File System Mounting
A file system must be mounted before it
can be accessed.
 A unmounted file system is mounted at a
mount point.

09/27/2007
File Systems
121
UCDavis, ecs251
Fall 2007
09/27/2007
File Systems
122
UCDavis, ecs251
Fall 2007
09/27/2007
Mount Point
File Systems
123
UCDavis, ecs251
Fall 2007
logical disks
fs0: /dev/hd0a
/
usr
sys
dev
lib
bin
etc
bin
mount -t ufs /dev/hd0e /usr
/
local adm
home
fs1: /dev/hd0e
mount -t nfs 152.1.23.12:/export/cdrom /mnt/cdrom
09/27/2007
File Systems
124
UCDavis, ecs251
Fall 2007
Distributed FS

Distributed File System
–
–
–
–
09/27/2007
NFS (Network File System)
AFS (Andrew File System)
GFS (Google File System)
CODA
File Systems
125
UCDavis, ecs251
Fall 2007
Distributed FS
ftp.cs.ucdavis.edu fs0: /dev/hd0a
/
usr
sys
dev
lib
bin
etc
bin
/
local adm
home
Server.yahoo.com fs0: /dev/hd0e
09/27/2007
File Systems
126
UCDavis, ecs251
Fall 2007
Distributed File System
Transparency and Location Independence
 Reliability and Crash Recovery
 Scalability and Efficiency
 Correctness and Consistency
 Security and Safety

09/27/2007
File Systems
127
UCDavis, ecs251
Fall 2007

Correctness
One-copy Unix Semantics??
09/27/2007
File Systems
128
UCDavis, ecs251
Fall 2007

Correctness
One-copy Unix Semantics
– every modification to every byte of a file has to
be immediately and permanently visible to
every client.
09/27/2007
File Systems
129
UCDavis, ecs251
Fall 2007

Correctness
One-copy Unix Semantics
– every modification to every byte of a file has to
be immediately and permanently visible to
every client.
– Conceptually FS sequent access
Make sense in a local file system
 Single processor versus shared memory


Is this necessary?
09/27/2007
File Systems
130
UCDavis, ecs251
Fall 2007
DFS Architecture

Server
– storage for the distributed/shared files.
– provides an access interface for the clients.

Client
– consumer of the files.
– runs applications in a distributed environment.
open
read
opendir
readdir
close
write
stat
applications
09/27/2007
File Systems
131
UCDavis, ecs251
Fall 2007


NFS (SUN, 1985)
Based on RPC (Remote Procedure Call) and XDR
(Extended Data Representation)
Server maintains no state
– a READ on the server opens, seeks, reads, and closes
– a WRITE is similar, but the buffer is flushed to disk before closing


Server crash: client continues to try until server reboots –
no loss
Client crashes: client must rebuild its own state – no effect
on server
09/27/2007
File Systems
132
UCDavis, ecs251
Fall 2007
RPC - XDR
RPC: Standard protocol for calling
procedures in another machine
 Procedure is packaged with authorization
and admin info
 XDR: standard format for data, because
manufacturers of computers cannot agree on
byte ordering.

09/27/2007
File Systems
133
UCDavis, ecs251
Fall 2007
rpcgen
RPC program
data
structure
RPC client.c
09/27/2007
rpcgen
RPC.h
File Systems
data
structure
RPC server.c
134
UCDavis, ecs251
Fall 2007
NFS Operations
Every operation is independent: server
opens file for every operation
 File identified by handle -- no state
information retained by server
 client maintains mount table, v-node, offset
in file table etc.

What do these imply???
09/27/2007
File Systems
135
UCDavis, ecs251
Fall 2007
Client computer
Client computer
Application Application
program
program
NFS
Application
program Client
Kernel
Server computer
Application
program
UNIX
system calls
Virtual file system
Operations
on local files
UNIX
file
system
Other
file system
UNIX kernel
Virtual file system
Operations
on
remote files
NFS
client
NFS
NFS Client
server
UNIX
file
system
NFS
protocol
(remote operations)
mount –t nfs home.yahoo.com:/pub/linux /mnt/linux
09/27/2007
File Systems
136
*
UCDavis, ecs251
Fall 2007
09/27/2007
File Systems
137
UCDavis, ecs251
Fall 2007

State-ful vs. State-less
A server is fully aware of its clients
– does the client have the newest copy?
– what is the offset of an opened file?
– “a session” between a client and a server!

A server is completely unaware of its clients
– memory-less: I do not remember you!!
– Just tell me what you want to get (and where).
– I am not responsible for your offset values (the client needs to
maintain the state).
09/27/2007
File Systems
138
UCDavis, ecs251
Fall 2007
The State
open
read
stat
lseek
applications
open
read
stat
lseek
offset
applications
09/27/2007
File Systems
139
UCDavis, ecs251
Fall 2007
Network File Sharing

Server side:
– Rpcbind (portmap)
– Mountd - respond to mount requests (sometimes called
rpc.mountd).

Relies on several files
– /etc/dfs/dfstab,
– /etc/exports,
– /etc/netgroup
–
–
–
–
nfsd - serves files - actually a call to kernel level code.
lockd – file locking daemon.
statd – manages locks for lockd.
rquotad – manages quotas for exported file systems.
09/27/2007
File Systems
140
UCDavis, ecs251
Fall 2007
Network File Sharing

Client Side
– biod - client side caching daemon
– mount must understand the hostname:directory convention.
– Filesystem entries in /etc/[v]fstab tell the client what filesystems to
mount.
09/27/2007
File Systems
141
UCDavis, ecs251
Fall 2007

Unix file semantics
NFS:
– open a file with read-write mode
– later, the server’s copy becomes read-only
mode
– now, the application tries to write it!!
09/27/2007
File Systems
142
UCDavis, ecs251
Fall 2007

Problems with NFS
Performance not scaleable:
– maybe it is OK for a local office.
– will be horrible with large scale systems.
09/27/2007
File Systems
143
UCDavis, ecs251
Fall 2007

Similar to UNIX file caching for local files:
– pages (blocks) from disk are held in a main memory buffer cache until
the space is required for newer pages. Read-ahead and delayed-write
optimisations.
– For local files, writes are deferred to next sync event (30 second
intervals)
– Works well in local context, where files are always accessed through
the local cache, but in the remote case it doesn't offer necessary
synchronization guarantees to clients.

NFS v3 servers offers two strategies for updating the disk:
– write-through - altered pages are written to disk as soon as they are
received at the server. When a write() RPC returns, the NFS client
knows that the page is on the disk.
– delayed commit - pages are held only in the cache until a commit() call
is received for the relevant file. This is the default mode used by NFS
v3 clients. A commit() is issued by the client whenever a file is closed.
09/27/2007
File Systems
144
*
UCDavis, ecs251
Fall 2007

Server caching does nothing to reduce RPC traffic between client and
server
– further optimisation is essential to reduce server load in large networks
– NFS client module caches the results of read, write, getattr, lookup and
readdir operations
– synchronization of file contents (one-copy semantics) is not guaranteed
when two or more clients are sharing the same file.

Timestamp-based validity check
– reduces inconsistency, but doesn't eliminate it
– validity condition for cache entries at the client:
(T - Tc < t) v (Tmclient = Tmserver)
– t is configurable (per file) but is typically set to
3 seconds for files and 30 secs. for directories
– it remains difficult to write distributed
applications that share files with NFS
09/27/2007
File Systems
t
Tc
freshness guarantee
time when cache entry was
last validated
Tm time when block was last
updated at server
T current time
145
*
UCDavis, ecs251
Fall 2007
AFS
State-ful clients and servers.
 Caching the files to clients.

– File close ==> check-in the changes.

How to maintain consistency?
– Using “Callback” in v2/3 (Valid or Cancelled)
open
read
applications
invalidate and re-cache
09/27/2007
File Systems
146
UCDavis, ecs251
Fall 2007
Why AFS?


Shared files are infrequently updated
Local cache of a few hundred mega bytes
– Now 50~100 giga bytes

Unix workload:
– Files are small, Read Operations dominated, sequential
access is common, read/written by one user, reference
bursts.
– Are these still true?
09/27/2007
File Systems
147
UCDavis, ecs251
Fall 2007
Fault Tolerance in AFS

a server crashes

a client crashes
– check for call-back tokens first.
09/27/2007
File Systems
150
UCDavis, ecs251
Fall 2007
Problems with AFS
Availability
 what happens if call-back itself is lost??

09/27/2007
File Systems
151
UCDavis, ecs251
Fall 2007
GFS – Google File System
“failures” are norm
 Multiple-GB files are common
 Append rather than overwrite

– Random writes are rare

Can we relax the consistency?
09/27/2007
File Systems
152
UCDavis, ecs251
Fall 2007
09/27/2007
File Systems
153
UCDavis, ecs251
Fall 2007
The Master
Maintains all file system metadata.
names space, access control info, file to chunk mappings,
chunk (including replicas) location, etc.
Periodically communicates with
chunkservers in HeartBeat messages to
give instructions and check state
09/27/2007
File Systems
154
UCDavis, ecs251
Fall 2007
The Master
Helps make sophisticated chunk
placement and replication decision,
using global knowledge
For reading and writing, client contacts
Master to get chunk locations, then deals
directly with chunkservers
Master is not a bottleneck for reads/writes
09/27/2007
File Systems
155
UCDavis, ecs251
Fall 2007
Chunkservers
Files are broken into chunks. Each chunk has
a immutable globally unique 64-bit chunkhandle.
handle is assigned by the master at chunk creation
Chunk size is 64 MB
Each chunk is replicated on 3 (default) servers
09/27/2007
File Systems
156
UCDavis, ecs251
Fall 2007
Clients
Linked to apps using the file system API.
Communicates with master and
chunkservers for reading and writing
Master interactions only for metadata
Chunkserver interactions for data
Only caches metadata information
Data is too large to cache.
09/27/2007
File Systems
157
UCDavis, ecs251
Fall 2007
Chunk Locations
Master does not keep a persistent record of
locations of chunks and replicas.
Polls chunkservers at startup, and when new
chunkservers join/leave for this.
Stays up to date by controlling placement of
new chunks and through HeartBeat messages
(when monitoring chunkservers)
09/27/2007
File Systems
158
UCDavis, ecs251
Fall 2007
09/27/2007
File Systems
159
UCDavis, ecs251
Fall 2007
Atomic commit protocols
one-phase atomic commit protocol
– the coordinator tells the participants whether to commit or abort
– what is the problem with that?
– this does not allow one of the servers to decide to abort – it may
have discovered a deadlock or it may have crashed and been
restarted
two-phase atomic commit protocol
– is designed to allow any participant to choose to abort a transaction
– phase 1 - each participant votes. If it votes to commit, it is
prepared. It cannot change its mind. In case it crashes, it must save
updates in permanent store
– phase 2 - the participants carry out the joint decision
The decision could be commit or abort –
participants record it in permanent store
09/27/2007
File Systems
160
•
UCDavis, ecs251
Fall 2007
Two phase commit (2PC)
Coordinator
Server
Server
Server
Server
What is your result?
…..
Coordinator
Server
09/27/2007
Server
Server
Server
File Systems
Server
Final consensus.
…..
Server
161
UCDavis, ecs251
Fall 2007

Commit protocols are designed to work in
–
–
–
–
–

Failure model
asynchronous system (e.g. messages may take a very long time)
servers/coordinator may crash
messages may be lost.
assume corrupt and duplicated messages are removed.
no byzantine faults – servers either crash or they obey their
requests
2PC is an example of a protocol for reaching a consensus.
– because crash failures of processes are masked by replacing a
crashed process with a new process whose state is set from
information saved in permanent storage and information held by
other processes.
09/27/2007
File Systems
162
•
UCDavis, ecs251
Fall 2007

2PC
2PC
– voting phase: coordinator asks all servers if
they can commit

if yes, server records updates in permanent storage
and then votes
– completion phase: coordinator tells all servers
to commit or abort
09/27/2007
File Systems
163
•
UCDavis, ecs251
Fall 2007
INIT
INIT
vote
canCommit
WAIT
READY
doAbort
ABORT
COMMIT
ABORT
doCommit
COMMIT
haveCommitted
09/27/2007
File Systems
164
UCDavis, ecs251
Fall 2007
INIT
INIT
vote
canCommit
WAIT
READY
doAbort
ABORT
COMMIT
ABORT
doCommit
COMMIT
haveCommitted
09/27/2007
File Systems
165
UCDavis, ecs251
Fall 2007
Failures
Some servers missed “canCommit”.
 Coordinator missed some “votes”.
 Some servers missed “doAbort” or
“doCommit”.

09/27/2007
File Systems
166
UCDavis, ecs251
Fall 2007
Failures/Crashes
Some servers crashed b/a “canCommit”.
 Coordinator crashed b/a receiving some
“votes”.
 Some servers crashes b/a receiving
“doAbort” or “doCommit”.

09/27/2007
File Systems
167
UCDavis, ecs251
Fall 2007
INIT
INIT
vote
canCommit
WAIT
READY
doAbort
ABORT
COMMIT
ABORT
doCommit
COMMIT
haveCommitted
Assume the coordinator crashed after “canCommit” messages
have been sent:
(0). Some servers have not received the vote requests.
(1). All good servers are in the WAIT state.
(2). Some servers are in either ABORT or COMMIT state.
(3). All servers are in either ABORT or COMMIT state.
09/27/2007
File Systems
WAIT/INIT
WAIT/???
ABORT/COMMIT
ABORT/COMMIT
168
UCDavis, ecs251
Fall 2007
3PC
Skeen & Stonebraker, 1983
INIT
Uncertain
vote
Aborted
WAIT
Committable
ABORT
Pre-COMMIT
ACK
Committed
COMMIT
09/27/2007
File Systems
169
UCDavis, ecs251
Fall 2007
Consistency
Read: are we reading the fresh copy?
 Write: have we updated all the copies?
 Failure/Partition

09/27/2007
File Systems
170
UCDavis, ecs251
Fall 2007
NFS, AFS, & GFS

Read the newest copy
– NFS: as long as we don’t cache…
– AFS: the read callback might be broken…
– GFS: we can check the master…
09/27/2007
File Systems
171
UCDavis, ecs251
Fall 2007
NFS, AFS, & GFS

Write to the master copy
– NFS: write through
– AFS: you have to have the callback …
– GFS: we will update all three replicas, but what
happen if a read occurs during the process we
are updating the copies. (name space and file to
chunks mapping)
09/27/2007
File Systems
172
UCDavis, ecs251
Fall 2007
Pessimistic versus Optimistic

Locking the world while I am updating it…
– Scaleable?
– “Serializable schedule”
– Assuming a close system -- “you can NOT fail
at this moment as I am updating this particular
transaction” -- we can use “log” to handle the
issue of “atomicity” (all or nothing).
09/27/2007
File Systems
173
UCDavis, ecs251
Fall 2007
Soft Update

Create X (t1) and Delete Y (t2)

T1, T2
T2, T1


We will enforce it when resolve the problem of
circular dependency!
09/27/2007
File Systems
174
UCDavis, ecs251
Fall 2007
SU & Background FSCK
Soft Update guarantees that the File System
will always in a “consistent” state at all
time.
 Essentially, Soft Update prevents any
chances of inconsistency!

09/27/2007
File Systems
175
UCDavis, ecs251
Fall 2007
An Alternative Approach
“Optimistic”
 The chance for certain bad things to occur is
very small (depending on what you are
doing).
 And, it is very expensive to pessimistically
prevent the probability.
 “Let it happen, and we try to detect and
recover from that…”

09/27/2007
File Systems
176
UCDavis, ecs251
Fall 2007
The Optimistic Approach
Regular Execution with “recording”
 Conflict Detection based on the “recorded”
 Conflict resolution

09/27/2007
File Systems
177
UCDavis, ecs251
Fall 2007
Example

Allowing “inconsistencies”
– Without soft update, we have to do FSCK in the
foreground (before we can use it).
– I.e., we try to eliminate “inconsistencies”

But, do we really need “perfectly consistent
FS” for all the applications?
– Why not, take a snapshot and then do
background FSCK anyway!
09/27/2007
File Systems
178
UCDavis, ecs251
Fall 2007
Optimistic?
NFS
 AFS
 GFS

09/27/2007
File Systems
179
UCDavis, ecs251
Fall 2007
Optimistic?

NFS
– If the server changes the access mode in the middle of
an open session from a client…

AFS
– “Callback” is the check for inconsistencies.

GFS
– Failure in the middle of a write/append
09/27/2007
File Systems
180
UCDavis, ecs251
Fall 2007

CODA
Server Replication:
– if one server goes down, I can get another.

Disconnected Operation:
– if all go down, I will use my own cache.
09/27/2007
File Systems
181
UCDavis, ecs251
Fall 2007
09/27/2007
File Systems
182
UCDavis, ecs251
Fall 2007
Disconnected Operation
Continue critical work when that repository
is inaccessible.
 Key idea: caching data.

– Performance
– Availability

Server Replication
09/27/2007
File Systems
183
UCDavis, ecs251
Fall 2007
09/27/2007
File Systems
184
UCDavis, ecs251
Fall 2007
09/27/2007
File Systems
185
UCDavis, ecs251
Fall 2007
09/27/2007
File Systems
186
UCDavis, ecs251
Fall 2007
09/27/2007
File Systems
187
UCDavis, ecs251
Fall 2007
09/27/2007
File Systems
188
UCDavis, ecs251
Fall 2007
09/27/2007
File Systems
189
UCDavis, ecs251
Fall 2007

Consistency
If John update file X on server A and Mary
read file X on server B….
Read-one & Write-all
09/27/2007
File Systems
190
UCDavis, ecs251
Fall 2007
Read x & Write (N-x+1)
read
write
09/27/2007
File Systems
191
UCDavis, ecs251
Fall 2007
Initial
Alice-W
Bob-W
Alice-R
Chris-W
Dan-R
Emily-W
Frank-R
09/27/2007
Example: R3W4 (6+1)
0
2
2
2
2
2
7
7
0
2
3
3
1
1
7
7
0
0
3
3
1
1
1
1
File Systems
0
2
3
3
1
1
1
1
0
2
3
3
1
1
1
1
0
0
0
0
0
0
7
7
192
UCDavis, ecs251
Fall 2007
Recovery
The paper describes two recovery experiments on a cluster
containing 15,000-16,000 chunks with 600-660 GB of data.
(a) One chunk server was killed. Recovery time was 23
minutes. Describe the process.
(b) Two chunk server were killed. Explain how the recovery
process was changed.
(c) Suppose that the master server had been killed. How would
the recovery have taken place?
09/27/2007
File Systems
193
UCDavis, ecs251
Fall 2007
Consistency Model
(a)
(b)
(c)
(d)
What is a consistency model?
How does the consistency model used by the
Google File System differ from the model that
might be used by a general purpose file system?
How does the file system guarantee correctness of
the namespace?
Explain what happens, from a consistency
viewpoint, when an application attempts to
append to a file.
09/27/2007
File Systems
194
UCDavis, ecs251
Fall 2007
Client computer
Lookup
AddName
UnName
GetNames
Server computer
Directory service
Application Application
program
program
Flat file service
Client module
Read
Write
Create
Delete
GetAttributes
SetAttributes
09/27/2007
File Systems
195