File Systems - Wichita State University

Transcript File Systems - Wichita State University

Chapter 6
File Systems
6.1 Files
6.2 Directories
6.3 File system implementation
6.4 Example file systems
1
Files
• Requirements for long term information storage:
1.Must store large amounts of data
2.Information stored must survive the termination of
the process using it
3.Multiple processes must be able to access the
information concurrently
• Solution: Store information on disk or other
external media in units called files.
• The file system is a part of operating system that
manages files.
2
Files
• File naming - file name.file extention
• File structure
– unstructured sequence of bytes.
– record sequence.
– B-tree.
MS-DOS/UNIX
CP/M
3
File Naming
Typical file extensions.
4
File Structure
• Three kinds of files
– byte sequence
– record sequence
– tree
5
Files
• File types
– Regular files contain user information either ASCII
or binary.
– Directories are system files for maintaining the
structure of the file system.
– Character special files are used to model serial I/O
devices such as terminals, printers, and networks.
– Block special files are used to model disks.
• A UNIX executable file stats with a magic
number, identifying the file as an executable
file.
6
File Types
(a) An executable file (b) An archive
7
File Access
• Sequential access
– read all bytes/records from the beginning
– cannot jump around, could rewind or back up
– convenient when medium was magnetic tape
• Random access
– bytes/records read in any order
– essential for database systems
– Two methods are used for specifying where to start
reading.
• read and then move file marker
• move file marker (seek), then read
8
File Attributes
• Operating systems associate extra information with
each file, called file attributes.
Possible file attributes
9
File Operations
1. Create
2. Delete
3. Open
4. Close
5. Read
6. Write
7. Append
8. Seek
9. Get attributes
10.Set Attributes
11.Rename
10
An Example Program Using File System Calls
11
An Example Program Using File System Calls
12
Memory-Mapped Files (Ex11.c)
• To facilitate access to files, systems provides system
calls to map files into the address space of a running
process and remove (unmap) the files from the address
space.
• File services as a backing store for the process and
when the process finishes all mapped, modified pages
are written back to their files.
– Advantage: eliminate need for I/O.
– Disadvantage:
• difficult to know size of output file. In case of all zeroes,
 10 0's ?? or 100 0's ??
• Mapped file modified by one process is read differently by another.
Two processes need to see consistent views of the file.
• file may be too large to fit.
13
Memory-Mapped Files
(a) Segmented process before mapping files
into its address space
(b) Process after mapping
existing file abc into one segment
creating new segment for xyz
14
Directories
• File systems have directories or folders to keep track
of files.
– A single-level directory has one directory (root) containing
all the files.
– A two-level directory has a root directory and user directories.
– A hierarchical directory has a root directory and arbitrary
number of subdirectories.
• Two different methods are used to specify file names in
a directory tree:
– Absolute path name consists of the path from the root
directory to the file.
– Relative path name consists of the path from the current
directory (working directory).
15
Directories - A single level
directory system
• A single level directory system
– contains 4 files
– owned by 3 different people, A, B, and C
16
Two-level Directory Systems
Letters indicate owners of the directories and files
17
Hierarchical Directory Systems
A hierarchical directory system
18
Directories
• The path name would be written:
– Winodws \usr\ast\mailbox
– UNIX /usr/ast/mailbox
– MULTICS >usr>ast>mailbox
• Dot and dot dot are two special entries in the file
system.
– Dot (.) refers to the current directory.
– Dot dot (..) refers to its parent.
19
Path Names
A UNIX directory tree
20
Directory Operations
1.
2.
3.
4.
Create
Delete
Opendir
Closedir
5. Readdir
6. Rename
7. Link
8. Unlink
21
File System Implementation
• File system layout:
– MRB (Master Boot Record) is used to boot the computer.
– The partition table gives the starting and ending addresses of
each partition.
– Partitions:
• The first block, boot block, of the active partition is read in by the
MRB program when the system is booted.
• The superblock contains all the key parameters about the file system.
• Free blocks information
• i-nodes tells all about the file.
• Root directory
• Directories and files
22
File System Implementation
A possible file system layout
23
File System Implementation
• Implementing file storage is keeping track of which
disk blocks go with which files.
• Contiguous Allocation - store each file as contiguous
block of data.
– Advantage:
• Simple to implement
• Read performance is excellent
– Disadvantage:
• Disk fragmentation
• The maximum file size must be known when file is created.
– Example: CD-ROMs, DVDs, and write-once optical media
• Linked List Allocation - keep linked list of disk blocks
– Disadvantage:
• random access slow
• amount of data in a block not a power of 2
24
Implementing Files
(a) Contiguous allocation of disk space for 7 files
(b) State of the disk after files D and E have been removed
25
Implementing Files
Storing a file as a linked list of disk blocks
26
File System Implementation
• Linked List Allocation using an index - take
table pointer word from each block and put them
in an index table, FAT (File Allocation Table)
in memory.
– Disadvantage - entire table must be in memory all
the time
• I-node (index-node) lists the attributes and disk
addresses of the file's blocks.
27
Implementing Files
Linked list allocation using a file allocation table in RAM
28
Implementing Files
An example i-node
29
Implementation directories
• When a file is opened, the file system uses the
path name to locate the directory entry.
• The directory provides information needed to
find the disk blocks.
– disk address of the entire file (contiguous blocks)
– the number of first block (linked list)
– the number of i-node (i-node)
• Where to store attributes? In directory or i-node?
30
Implementing Directories
(a) A simple directory – MS-DOS/Windows
fixed size entries
disk addresses and attributes in directory entry
(b) Directory in which each entry just refers to an i-node - UNIX
31
Implementation directories
• Handling long file names in a directory:
– Fixed-length names (Waste space)
– In-line (When a file is removed, a variable-sized gap
is introduced.)
– Heap (The heap management needs extra effort.)
• How to search files in each directory?
– Linearly
– Hash table
– Cache the results of searches
32
Implementing Directories
• Two ways of handling long file names in directory
– (a) In-line
– (b) In a heap
33
Shared files
• A shared file is used to allow a file to appear in
several directories.
• The connection between a directory and the
shared file is called a link. The file system is a
Directed Acyclic Graph (DAG).
• Problem: If directories contain disk address, a
copy of the disk address will have to be made in
directory B. What if A or B append the file, the
new blocks will only appear in one directory.
34
Shared files
• Solution:
– Do not list disk block addresses in directories but in
a little data structure.
(use i-nodes)  (hard link)
– Create a new file of type link which contains the
path name of the file to which it is linked 
symbolic linking
35
Shared Files
File system containing a shared file
36
Shared files
• ln file1 file2
• Problem of hard link - when should i-node be removed?
Suppose A: rm file2
 could set count = 1 and leave i-node intact.
when count = 0, delete file and i-node.
• Problem above does not occur in symbolic link because
only the owner directory has a pointer to i-node. The
problem is extra overhead in the traversing path.
• Other problem is having multiple copies of a file may
set copied when dumping an files onto a disk (tar).
– do not descend path involving symbolic links.
37
Shared Files
(a) Situation prior to linking
(b) After the link is created
(c)After the original owner removes the file
38
Disk space management
• Strategies for storing an n byte file:
– Allocate n consecutive bytes of disk space - segment
– Allocate a number [n/k] blocks of size k bytes each paging
– problem – if the file grows it will have to be moved
on the disk, it is an expensive operation and causes
external fragmentation. =>
– All file systems chop files up into fixed-size blocks
that need not to be adjacent.
39
Block size
• When block size increase, disk space utilization
decrease (space efficiency decrease and internal
fragmentation).
• When block size decrease, data transfer rate
decrease (time efficiency decrease)
• usual size k = 512bytes, 1k (UNIX), or 2k
40
Disk Space Management
Block size
• Dark line (left hand scale) gives data rate of a disk
• Dotted line (right hand scale) gives disk space efficiency
• All files 2KB
41
Block size
• Example: disk with 131072 bytes per track.
rotation time = 8.33 msec
average seek time = 10 msec.
 time to read a block of k bytes
= 10 + 8.33/2 + (k/131072) * 8.33
= 10 + 4.165 + k/131072 * 8.33
If k = 1 KB = 1024 bytes
= 14.165 + 1024/131072 * 8.33
= 14.165 + 0.065
= 14.23 msec
• Disk space efficiency = % of block used by data.
– Observation: Assume that all files are 1 kbytes, on average 1/2 of last
block is empty.
42
Keeping Track OF Free Blocks
• Use linked list of disk blocks: each block holds as
many free disk block numbers as will fit.
– With 1 KB block and 32-bit disk block number  1024 *
8/32 = 256 disk block numbers  255 free blocks (and) 1
next block pointer.
• Use bit-map: A disk with (n) blocks requires a bit map
with (n) bits
–
–
–
–
Free blocks are represented by 1's
Allocated blocks represented by 0's
16GB disk has 224 1-KB and requires 224 bits  2048 blocks
Using a linked list = 224/255 = 65793 blocks. However, these
blocks can be freed up as the disk is filled up.
• Bit map generally better if it can be kept completely in
memory.
43
Disk Space Management
(a) Storing the free list on a linked list
(b) A bit map
44
Disk Space Management
(a) Almost-full block of pointers to free disk blocks in
RAM
- three blocks of pointers on disk
(b) Result of freeing a 3-block file
(c) Alternative strategy for handling 3 free blocks
- shaded entries are pointers to free disk blocks
45
Disk Space Management
Quotas for keeping track of each user’s disk use
46
File System Reliability
• The loss of a file system can be catastrophic.
• Methods to safeguard a file system:
– Bad Block Management
– Backups
• Bad Block Management
– Hardware solution - dedicate a sector to a "bad block list“
when disk controller is initiated, the bad block list is read and
a spare block is picked to replace each bad block. The
mapping is recorded in the bad block list.
– Software solution - user or file system carefully construct a
file containing all the bad blocks
47
File System Reliability
• Backups are made to handle: recover from
disaster or stupidity.
• Considerations of backups
– Entire or part of the file system
– Incremental dumps: dump only files that have
changed
– Compression
– Backup an active file system
– Security
48
File System Reliability
• Two strategies can be used for dumping a disk to
tape:
– Physical dump: starts at block 0 to the last one.
• Advantages: simple and fast
• Disadvantages: backup everything
– Logical dump: starts at one or more specified
directories and recursively dumps all files and
directories found that have changed since some given
base date.
49
File System Reliability
File that has
not changed
• A file system to be dumped
– squares are directories, circles are files
– shaded items, modified since last dump
– each directory & file labeled by i-node number
50
File System Reliability
• Bit maps used by the logical dumping algorithm
– After 4 phases, the dump is complete.
51
File System Consistency
• A utility program, called a file system checker
(fsck in UNIX or scandisk in Windows), can be
used to test the consistency of a file system.
• Two types of consistency checks can be made:
(a) blocks (b) files (directory)
52
File System Consistency
• Block consistency:
– Build two tables with a counter per block, initially
set to 0
• The counters in the first table keep track of number of
times each block is present in a file.
• The counters of the second table record the number of
times in free list,
– Then, the program reads all the i-nodes and uses the
i-nodes to build a list of all blocks used in the files
(incrementing file counter as each block is read).
– Check free list or bit map to find all blocks not in
use (increment free list counter for each block in
free list).
53
File System Reliability
• File system states
(a) consistent
(b) missing block – add it to the free list
(c) duplicate block in free list – rebuild the free list
(d) duplicate data block – copy the block to a free block
54
File System Consistency
• For checking directories – keep a list of counters per file starting
at the root directory, recursively inspect each directory. For each
file, increment the counter for the files i-node
• Compare computed value with link count stored in each i-node.
– i-node link count > computed value = number of directory
entries
• Even if all files are removed, the i-node link count > 0. So
the i-node will not be removed.
• Solution : set i-node link count = value computed
– i-node link count < computed value
• The i-node may bfreed even when there is another
directory points
to it
• directory will be pointing to unused
i-node
– solution : set inode
link count = computed value
• Protecting the user - rm * .o
55
File System Performance
• A block cache or buffer cache is a collection of blocks
that logically belong on the disk, but are kept in
memory to improve performance.
– All of the previous paging replacement algorithms can be
used to determine which block should be written when a new
block is needed and the cache is full.
• Modified LRU Scheme:
– The block is not likely to be needed again soon? No => go to
front
– The block is essential for the file system to be consistency
e.g. i- nodes, etc ? Yes => write immediately
56
File System Performance
• Periodically, all data block should be written out (e.g.
write works all day).
• UNIX - system call sync forces modified blocks out to
the disk immediately. Hard-disk oriented. e.g. update
runs in background during sync every 30 seconds
• MS-DOS - write-through cache => all modified
blocks are written immediately. Floppy disk oriented.
e.g. write a 1K block one character at a time
UNIX collect then together
MS-DOS 1 at a time
57
File System Performance
The block cache data structures
58
File System Performance
• Reading a block needs one access for the i-node and
one for the block. Save i-node access time.
(a) I-nodes placed at the start of the disk
(b) Disk divided into cylinder groups
– each with its own blocks and i-nodes
59
Log-Structured File Systems
• With CPUs faster, memory larger
– disk caches can also be larger
– increasing number of read requests can come from
cache
– thus, most disk accesses will be writes
• LFS Strategy structures entire disk as a log
– have all writes initially buffered in memory
– periodically write these at the end of the disk log
– when file opened, locate i-node by the i-node map,
then find blocks
60
Example File Systems
CD-ROM File Systems
The ISO 9660 directory entry
61
The CP/M File System
Memory layout of CP/M
62
The CP/M File System
The CP/M directory entry format
63
The MS-DOS File System
The MS-DOS directory entry
64
The MS-DOS File System
• Maximum partition for different block sizes
• The empty boxes represent forbidden combinations
65
The Windows 98 File System
Bytes
The extended MOS-DOS directory entry used in Windows 98
66
The Windows 98 File System
Bytes
Checksum
An entry for (part of) a long file name in Windows 98
67
The Windows 98 File System
An example of how a long name is stored in Windows 98
68
The UNIX V7 File System
A UNIX V7 directory entry
69
The UNIX V7 File System
A UNIX i-node
70
The UNIX V7 File System
The steps in looking up /usr/ast/mbox
71