File Systems - Brock University

Download Report

Transcript File Systems - Brock University

1

UNIX Internals – The New Frontiers Chapters 8 & 9

File Systems

2

Contents

 The User Interface to Files  File System  File System Framework  The Vnode/VFS Architecture  Implementation Overview  File-System-Dependent Objects  Mounting a File System  Operations on Files  The System V File System(s5fs)  S5fs Kernel

3

8.2 The User Interface

 files, directory, file descriptor, file systems  File & Directories  File: logically a container for data  A hierarchical, tree-structured name space  Pathname: all the components in the path from the root to the node, by “/”  “.” & “..”  Link: a directory entry for a file.

4

Directory tree

5

Operation on directory

    dirp = opendir(const *filename); direntp = readdir (dirp); rewinddir(dirp); status = closedir(firp);  struct dirent { int_t d_ino; char d_name[NAME_MAX +1]; };

6

File Attributes

 Kept in the inode: index node  File attributes:  File type  Number of hard links  File size  Device ID  Inode number  User and Group Ids of the owner of the file.

 Timestamps  Permissions and mode flags

7

Permissions and mode flags

 0wner, group, others (3 x 3 bits)  Read, write, execute (3 bits)  Mode flags - apply to executable files - suid, sgid – to set the user’s effective UID to that of the owner of the file, - stick – to retain file in swap area

8

System calls

    link, unlink – to create and delete hard links utimes – to change the access and modify timestamps, chown – to change the owner UID and GID, Chmode – to change permissions and mode flags.

9

File Descriptors

 fd = open (path, oflag, mode);  fd is a per-process object.

10

File descriptors

11

File I/O

 Random and sequential access  lseek – random access  nread = read(fd, buf, count);  Write has similar semantics  Operations are serialized  In append mode offset pointer set to the end of the file

12

Scatter-Gather I/O

 nbytes = writev(fd, iov, iovcnt);

13

File Locking

 Read and write are atomic.

 Advisory locks: protect from cooperative processes, flock() in 4BSD; in SVR3 chmod must be enabled first  SVR4: r/w locks.

 Mandatory locks:kernel  C library function

lockf

14

8.3 File systems

 Mount-on - a directory is covered by the mounted file system. - mount table (original) & vfs list (modern)  Restrictions - file cannot span file system, - each file system must reside on a single logical disk

15

16

Logical Disks

 A logical disk is a storage abstraction that the kernel sees as a linear sequence of fixed sized, randomly accessible blocks.

  newfs, mkfs, Traditional: partition – physical storage of a file system  Modern configurations:  Volume (several disks combined),  Disk mirroring  Stripe sets  RAID(Redundant Array of Inexpensive Disks)

17

Special files

 Generalization to include all kinds of I/O related objects such as directories, symbolic links, hardware devices (disks, terminals, printers, psuedodevices such as the system memory, and communications abstractions such as pipes and sockets;  Problems with hard links – may not span file systems,can be created by superuser only, ownership problems,

18

Special files

  Symbolic links – special file that points to another file (linked-to file); the data portion of the file contains the pathname of the linked-to file; may be stored in the I-node of the symbolic link ( more on this in Practical UNIX Programming pp.90-96); Pipes – created by

pipe

system call, deleted by the kernel automatically  FIFOs - created by

mknod

system call, must be explicitly deleted;

19

8.5 File System Framework

 Traditional UNIX can not support >1 types of FS.

 The new developments (DOS, file sharing, RFS, NFS) require the framework to change.

 AT&T: file system switch  Sun Microsystem: vnode/vfs  DEC: gnode  SVR4:(AT&T+ vnode/vfs+NFS)-> de facto standard

20

8.6 The Vnode/Vfs Architecture

 Objectives  Support several file system types simultaneously.

 Different disk partitions may contain different types of file systems.

 Support for sharing files over a network.

 Vendors should be able to create their own file system types and add them to the kernel.

21

Lessons from Device I/O

 Devices: block & character  Character device switch: struc cdevsw { int (*d_open)(); int (*d_close)(); int (*d_read)(); int (*d_write)(); } cdevsw[ ];  Major device number: as the index

1) 2) 3) 4) 5) 6) 7) 8) 9) 22

read

system call(in traditional UNIX) Use the file descriptor to get to the open file object; Check the entry to see if the file is open for read; Get the pointer to the in-core inode from this entry; Lock the inode so as to serialize access to the file; Check the inode mode field and find that the file is a character device file.

Use the major device number to index into a table of character devices and obtain the cdevsw entry for this device; From the cdevsw, obtain the pointer to the d_read routine for this device; Invoke the d_read operation to perform the device specific processing of the read request.

Unlock the inode and return to the user.

23

Lessons from Device I/O

 It is necessary to separate the file subsystem code into file-system independent code and file-system dependent code  The interface between these two parts is defined by a set of generic functions that are called by the file system independent code

24

Object Oriented Design

25

Overview of the Vnode/Vfs Interface

 Vnode represents a file in the UNIX kernel.

 Vfs represents a file system

26

)

27

base class data and operations pointers

 v_data: inode(s5fs), rnode(NFS), tmpnode(tmpfs),  v_op: vnodeops Example: to close the file associated with the vnode  #define VOP_CLOSE(vp,…) (*((vp)->v_opclose))(vp,…)

28

VFS base class

29

8.7 Implementation Overview

 Objectives  Each operation must be carried out on behalf of the current process.

 Certain operations may need to serialize access to the file.

 The interface must be stateless and reentrant.

 FS implementation should be allowed to use global resources, such as buffer cache.

 The interface should be usable by the server side  The use of fixed-size static tables must be avoided.

30

Vnodes and Open Files

 The vnode is the fundamental abstraction that represents an active file in the kernel.

 access to a vnode:  by a file descriptor  by file-system-dependent data structures

Data structures

31 Reference count

32

The Vnode

struct vnode {u_short v_flag; u_short v_count; struct vfs *vfsmountedhere; struct vnodeops *v_op; struct vfs *vfsp; … }; // p242

33

Vnode Reference Count

 It determines how long the vnode must remain in the kernel.

 Reference versus lock:  Acquire a reference:  Open a file  A process holds a reference to its current directory.

 When a new file system is mounted  Pathname traversal routine  file is deleted physically when reference count becomes zero.

34

The Vfs Object

 struct vfs {  struct vfs *vfs_next;  struct vfsops * vfs_op;  struct vnode *vfs_vnodecovered;  int vfs_fstype;  caddr_t vfs_data;   dev_t vfs_dev; …  }; //p243

35

36

8.8 File-System-Dependent Objects

 The Per-File Private Data  Vnode is an abstract objects.

37

The vnodeops Vector

struct vnodeops{ int (*vop_open)(); int (*vop_close)(); … }; //p245 For ufs: struct vnodeops ufs_vnodeops = { ufs_open; ufs_close; … }; //p246

38

39

File-System-Dependent Parts of the Vfs Layer

struct vfsops { int (*vfs_mount)(); int (*vfs_unmount)(); int (*vfs_root)(); int (*vfs_statvfs)(); int (*vfs_sync)(); … }; //p246

40

8.9 Mounting a File System

41  mount(spec, dir, flags, type, dataptr, datalen) //SVR4  Virtual File System Switch - a global table containing one entry for each file system type .

struct vfssw{ char *vsw_name; int (*vsw_init)(); struct vfsops * vsw_vfsops; ….

} vsfsw[];

42

mount

Implementation

 Adds the structure to the linked list headed by rootvfs.

 Sets the vfs_op field to the vfsops vector specified in the switch entry.

 Sets the vfs_vnodecovered field to point to the vnode of the mount point directory.

43

VFS_MOUNT

processing

 Verify permissions for the operation.

 Allocate and initialize the private data object of the file system.

 Store a pointer to it in the vfs_data field of the vfs object.

 Access the root directory of the file system and initialize its vnode in memory.

44

8.10 Operations on Files

Pathname Traversal lookuppn(): u_cdir 1.

2.

3.

4.

5.

6.

7.

8.

9.

10.

v_type is of a directory “..” & system root – move on “..” & a mounted system root – access the mount point VOP_LOOKUP Not found, last one - success, else – error ENOENT A mount point - go to the mounted vfs root A symbolic link – translate it and append Release the directory Go back to the top of the loop Terminate, do not release the reference of the final vnode //p250

45

Opening a file

fd = open(pathname, mode) 1.

Allocate a descriptor 2.

3.

4.

5.

6.

7.

8.

9.

10.

Allocate an open file object Call lookuppn() Check the vnode for permissions Check for the operations Not exist, O_Creat, VOP_CREAT; ENOENT VOP_OPEN If O_TRUNC, VOP_SETATTR Initialize Return the index of the file descriptor //p252

46

Other topics

 File I/O  File attributes  User credentials  Analysis  Drawbacks of the SVR4 Implementation  The 4.4 BSD Model

47

Chapter 9

File System Implementations

48

9.2 The System V File System(s5fs)

 The layout of s5fs partition: B S inode list data blocks  Directories:  s5fs directory is a special file containing a list of files and subdirectories.

49

Inodes

 The inode contains administrative information,or meta data.  The node list contains all the inodes.

 On-disk inode - see Tab. 9-1  In-core inode have more fields

50

Inode Fields

51 Bit-fields

di_mode

Block array of inode —di_addr

inode 10, 10K 256, 256K 256*256=65K, 65M 52 256*256*256=16M, 16G

53

The superblock

 Size in blocks of the file system  Size in blocks of the inode list  Number of free blocks and inodes  Free block list  Free inode list

54

Free block list

55

9.3 s5fs Kernel Organization

 In-core Inodes  The vnode  Device ID  Inode number of the file  Flags for synchronization and cache management  Pointers to keep the inode on a free list  Pointers to keep the inode on a hash queue.

 Block number of last block read

56

Allocating and Reclaiming Inodes

 Inode table(LRU) containing the active inodes  Reference count of a vnode ==0 the reclaim the inode as free  Iget()(allocating):

Inode lookup

57  s5lookup()  Checks the directory name lookup cache  Directory name lookup cache Miss? Reads the directory one block at a time, searching the entries for the specified file name:Get it  If the file is in the directory, get the inode number, use iget() to locate the inode,  Inode in the table?get it: allocate a new inode, initialize, copy, put in the hash queue, also initialize the vnode(v_ops, v_data, vfs)  Return the pointer to the inode

File I/O (1)

58  Read(to a user buffer address)  Fd-> the open file object, verify mode-> vnode-> get the rw-lock->call s5read()  Offset -> block number & the offset -> uiomove()-> call copyout()  The page not in memory?page fault->the handler >s5getpage()->call bmap()  logical to physical mapping, search vnode’s page list, not in?allocates a free page and call the disk driver to read the data from disk  Sleeps until the I/O completes. Before copying to user data space, verifies the user has access  s5read() returns, unlock, advances the offset, returns the number of bytes read

59

File I/O (2)

 Write:  Not immediately to disk  May increase the file size  May require the allocation of data blocks  Read the entire block, write relevant data, write back all the block

60

Allocating and reclaiming Inodes

 When the reference count drops to 0..

 When a file becomes inactive….

 It is better to reuse inodes…………

61

Analysis of s5fs

 Reliability concern : super block  Performance:  2 disk I/Os  Blocks randomly located  Block size: 512(SVR2), 1024(SVR3)  Name: 14 characters  Inodes limit: 65535

62

The Berkeley Fast File System

 Hard disk structure  On-disk organization - Blocks and fragments - Allocation policy  FFS functionality enhancements – long file names, - symbolic links, - other enhancements;  Analysis

63

Other file systems

 Temporary file systems - RAM disk, mfs, tmpfs)  The Specfs File System  The /proc File System

64

Linux Virtual File System

 Uniform file system interface to user processes  Represents any conceivable file system’s general feature and behavior  Assumes files are objects that share basic properties regardless of the target file system

65

66

67

Primary Objects in VFS

 Superblock object  Represents a specific mounted file system  Inode object  Represents a specific file  Dentry object  Represents a specific directory entry  File object  Represents an open file associated with a process