Transcript Document
Phones OFF Please
File Systems
Parminder Singh Kang
Home: www.cse.dmu.ac.uk/~pkang
Email: [email protected]
Introduction
• File systems are needed:
Because main memory is not big enough
To maintain permanent copies of information
• File system should be device independent to allow programs to use the same
commands with different devices,
• Disks provide the bulk of secondary storage on which a file system is maintained.
• The advantages of using disks are;
Ability to rewrite at same place.
User can access directly given block of information
And it improves efficiency and performance of operations
by transferring data in unit of blocks instead of byte by byte.
1 File Subsystem
• Provides users/applications with a logical interface
Impose uniform structure on storage,
i.e. typically hierarchical directory structure
Refer to elements by meaningful names
Specify operations on storage in application terms, e.g. read a real number
•Maps logical organisation to a physical storage media
• Hides details of the physical organisation –
Using device drivers allows all I/O to be treated alike
1.1 Device drivers
• Transfer from/to peripheral requires series of steps :
Check current status
Initiate status change
Request transfer
Receive notification complete
1.2 File Subsystem requirement
• Specifying Logical File characteristics
• Cataloguing/locating files
• Mapping physical to logical
• Supporting file operation
• Controlling access
1.3 File System Structure
• I/O Control:
Consist of device drivers and interrupt handlers to transfer information
between main memory and disk system.
The basic function of I/O controller is to access specific location on device.
• Basic File System:
Main function is to issue generic commands to appropriate device driver
to read and write physical block on disk.
Basic file system uses concept of physical address space. Each block is
identified by numeric disk address (e.g. drive, cylinder, track, sector)
•File Organization Module:
track of file allocation used and mapping between logical and physical blocks.
By knowing type of file allocation and physical address;
file organization module translates logical block address to physical address.
• Logical File System:
Contains metadata (file structure information, Inode information and
file control block information).
All information is managed by file control block (FCB). FCB includes
information about file name, inodes, permissions, location of file content etc.
Application programs
Logical File System
File-Organization module
Basic File System
I/O Control
Devices
1.4 File System Structure Implementation
• File system implementation refers to disk and memory.
• Implementation varies with operating system and file system use.
• On Disk Implementation:
Disk Label (VTOC)
Boot Control Block
Primary Super Block
Backup Super Block
Inode
• In Memory Implementation:
Contains information about each mounted partition.
In memory directory structure holds information about recently
accessed directories.
Open file table contains copy if Inodes for each open file.
disk and memory file system implementation and why they needed?
2. UNIX file characteristics
• Structure
File is flat sequence of bytes
UNIX imposes no structure
Other O/S sometimes do
• Naming conventions
Char sequence as name
Max length can vary
• Type
UNIX does not infer type from name
UNIX has different file type
Regular
Directories
Symbolic Links (Hard or Soft)
Device
• Organisation
Defines whether file is accessed sequentially or randomly
Both supported by current offset pointer
• Access
Define who can do what to file
Record when it was last done
3. Physical Storage
• Physical storage is mainly
hard disks
CD-Rom/DVD
Floppy disks
Magnetic tape (backup)
4. UNIX File System
• btree structured
• only one tree (one root)
• may be multiple disks
i.e. uses a device independent hierarchical which is regarded as a tree:
root
user
user files
bin etc dev usr
Device file systems can be attached to the tree using the program /etc/mount
• Once device is mounted files can be accessed using directory name;
• the device does not have to be known, e.g.
cp /user/test.dat mytest.dat
to copy a file to the current directory.
This has the advantage that a file system can be moved to a different device without
the programs, which use it needing modification.
• MS-DOS and Windows on the other hand, are not device independent,
i.e. one has to use device names, e.g.
copy a:test.dat c:\user.
4.1 Types of file
• Unix has:
regular files - users programs and data, etc.
directories
special files - I/O devices, e.g. /dev/tty and /dev/hd1
•MS-DOS/Windows has
regular files - users programs and data, etc.
Directories
special files – prn: con:, com1:, etc.
4.2 File names
• Standard UNIX allows up to 14 characters in a filename with combinations
of name and extension as required, e.g. test.data.
• In UNIX the extension is solely for the programmers convenience,
• i.e. to identify types of files at a logical level,
•e.g. a user may end all data files with. data
4.3 Using files
• two basic operation needed; read and write.
• At a program level one usually has a set of language or library or
systems calls to access files
4.4 Directories (only OS can write into Directory, Justify?)
• a directory is a special system file which contains details of other files
• provide a logical interface to user to keep track of files.
• Simple file systems, e.g. CP/M, have one directory per device:
These become large with many users
one can have name conflicts.
• Alternatives are to have one directory per user (RSX) or
many directories per user (UNIX, MS-DOS, Windows).
absolute path names (from the root)
e.g. on UNIX /usr1/stf/bb/public/opsys/progs/pipe1.cc
path names relative to the current directory,
e.g. network/programs/terminal or ../../test_data/test.data
4.5 File structure
• At the lowest (physical) level one can read/write blocks from/to a device.
• At a logical level one reads/writes records; where a record would be anything
from a byte (read a character) to N bytes (read a large structure).
4.6 Disk space management
• The surface of a disk is divided into a number of cylindrical tracks each of
which is divided up into a number of sectors.The unit reading/writing from/to a
disk is a block. (where a block may be anything from a sector to a track).
• Disk I/O transfer speed is made up from (depends upon rotation speed of disk):
start-up time (for floppies) ~ 0.5sec
track seek time (time to move head to require track) ~ 5-20msec for hard disk
latency (time for required sector to appear under head) ~ 0.5-5msec
transfer time ~ 0.05 - 0.1 mSec for 1Kblock
• larger the block size the faster the transfer rate will be but the more space is
wasted.
• a disk cache can improve transfer rates significantly.
4.7 Disk Partitions
• To assist in the overall organisation of file system large disks can
be split into logical disks by partitions.
Boot block (Master boot record)
Partition 1
Partition 2 etc
• Each partition can be either “raw” containing no file system
or “cooked” containing file system.
• Raw disk is used when no file system is appropriate.
For example; UNIX swap space uses raw partition,
as it uses its own format and does not use a file system.
• Boot information can be stored in a separate partition. (Why?)
• Root Partition, Contains operating system kernel and system files are mounted
at boot time.
5 Unix File storage
• A file system needs to keep track of
where every file is stored
details of each file
where next block of file is
which blocks are free
which blocks are in use
5.1 Allocating Space to Files
• allocate a contiguous sequence of blocks
unused small areas
problem with file growth
• Linked list of blocks
problem with random access
• index blocks (UNIX uses a version of this)
5.2 UNIX file storage structure
• only O/S can write to a directory
• each entry has inode number and name
┌───────────────┬────────── ─┐
│ i-node number
│ file name
│
└───────────────┴──────── ───┘
•command df (disk free) will tell you how many i-nodes are free.
• An i-node contains the following information on the file:
file mode (indicates type of file - normal file, special file, etc.)
number of links to file (e.g. from other directories)
owners user id
owners group id
access permissions for each user type, e.g. read, write, and execute
file size in characters
time created, last accessed, last modified
location of first 10 blocks (if file < 10 blocks contains address of the file)
single indexed, double indexed, triple indexed
5.3 Locating a file
• To locate a file’s data requires the following loop
Inode-data-inode-data….
• Note:
Root i-node is always i-node 2
Each directory has entries for . (current directory) and . . (parent directory)
root dir
user dir
staff dir
sam dir
┌───────────┐ ┌────►┌───────────┐ ┌────►┌───────────┐ ┌────►┌───────────┐
│
│ │
│
│ │
│
│ │
│
│
│
│ │
│
│ │
│
│ │
│
│
│
│ │
│
│ │
│
│ │
│
│
├───────────┤ │
├───────────┤ │
├───────────┤ │
├───────────┤
┌─┤
user
│ │
┌─┤
staff
│ │
┌─┤
sam
│ │
┌─┤
x.data │
│ ├───────────┤ │
│ ├───────────┤ │
│ ├───────────┤ │
│ ├───────────┤
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│
│ └───────────┘ │
│ └───────────┘ │
│ └───────────┘ │
│ └───────────┘
│
│
│
│
│
│
│
│
│
│
│
│
│
│
│
│
│
│
│
│
│
│ user i-node
│
│ staff i-node │
│
sam i-node
│
│ x.data i-node
│ ┌───────────┐ │
│ ┌───────────┐ │
│ ┌───────────┐ │
│ ┌───────────┐
└►│
│ │
└►│
│ │
└►│
│ │
└►│
│
│
│ │
│
│ │
│
│ │
│
│
│
│ │
│
│ │
│
│ │
│
│
│
│ │
│
│ │
│
│ │
│
│
│
│ │
│
│ │
│
│ │
│
│
│
│ │
│
│ │
│
│ │
│
│
│ location ├──┘
│ location ├──┘
│ location ├──┘
│ location ├──► file
│
│
│
│
│
│
│
│
│
│
│
│
│
│
│
│
└─────────┘
└─────────┘
└─────────┘
└─────────┘
• The sequence of events is:
the root directory is searched for the user directory file entry
the i-node number is extracted and the location of the user directory found
from the i-node
the user directory is searched for the staff directory file entry
the i-node number is extracted and the location of the staff directory found
from the i-node
the staff directory is searched for the sam directory file entry
the i-node number is extracted and the location of the sam directory found
from the i-node
the sam directory is searched for the x.data file entry
the i-node number is extracted and the locations of the x.data file
found from the i-node
5.4 The i-node and data addressing
• Addresses of data blocks of file stored in inode:
10 direct pointers
1 single indirect pointer to a block of addresses
1 double indirect pointer to a block, which contains pointers to
blocks of addresses.
Some systems have a third level pointer.
5.4.1 how big should the pointers be?
• Early versions of Unix used 16 bit pointers.
• With 1K blocks this meant you were limited to 65Mbyte as the largest disk.
• Later versions of Unix used 32 bit pointers and 4K (or 8K blocks)
• This gives a maximum file/disk size of 4K x 4Gbytes = 16Tbytes.
• Modern versions of Unix are using 64 bit pointers.
5.4.2 File Operations
• Opening an existing file
• Creating a file
• Reading from a file
• Writing to a file
• Closing a file
• Deleting a file
• Changing access permissions
• Renaming a file
5.4.3 Links in Unix
• Each file has one i-node but may have many directory entries
• Each name entry is a link to an i-node
• links may be hard or soft
• Hard links
each directory entry points directly at same i-node
the i-node maintains count of links to it
this only operates on a single device
• Soft Links
Special file containing the path to the target file
Separate i-node
Can span devices
5.5 Efficiency and Performance
• Unix uses a buffer cache to hold large block of memory
• As blocks are read they are stored in the cache, Reading next block can go on
while current block is being processed
• If cache is sufficiently large or not?
• further improvement using Delayed write (can be problem if system crashes)
i-nodes written back immediately
Written data blocks are flushed after a few sec’s
written to disk but marked delayed write.
Block can be modified further before it reaches the head of the list
when it is then written. Useful if file is deleted before block written.
• Eventually cache fills
Block that was accessed longest ago is flushed
Read ahead improves efficiency
6 Log Structured File Systems
• CPU’s are getting faster
• Memory is getting faster
• Disks are getting bigger, but not much faster
This creates bottleneck in the file system - especially for large file servers (Solution?)
Log structured file systems
• Most accesses are to the cache
• writes slow the system (small quantity of data)
• Disks operate most efficiently with large writes (one or more tracks) therefore;
Collect writes together and write them all at once as a log record.
If record is big (~ 1Mbyte) disk will operate efficiently.
Record contains i-nodes, directories, data mixed up.
Need a table to keep track of where every i-node is.
Keep this in memory and on disk.
Note:
• Much more complex to administer.
• Eventually disk fills.
• Have a garbage collection process which goes through log records and
compacts them.Disk operates like a very large circular buffer.
7 DOS/Windows
• The file system has
Boot sector
FAT
Root directory and Data blocks
• The directory entry contains all the details about the file including the name.
• It has a pointer to the first block.
• To find the next block the system uses a FAT (File Allocation table).
• The FAT is a large one dimensional array. There is an entry for each block
which contains either
The address of the next blockor End Of File marker