Chapter 8 File Management

Download Report

Transcript Chapter 8 File Management

Chapter 8 File Management

2 8.1 Introduction • Data should be organized in some convenient and efficient manner. In particular, users should be able to: – Put data into files – Find and use files that have previously been created

3 File System • Set of OS Services that provides Files and Directories for user applications

8.2 Files • A file is simply a sequence of bytes that have been stored in some device (storage) on the computer 4

5 Files • • Those bytes will contain whatever data we would like to store in the file such as: – A text file just containing characters that we are interested in – A word processing document file that also contains data about how to format the text – A database file that contains data organized in multiple tables. In general, the File Management system does not have any knowledge about how the data in a file is organized. That is the responsibility of the application programs that create and use the file.

6 Permanent (non-volatile) Storage Devices • • • • Disk Drives Flash Memory (Memory stick) CDs and DVDs Magnetic tape drives

7 8.2.1 File Attributes • • • Name – Symbolic (Human-readable) name of the file Type – Executable file, print file, etc.

Location – Where file is on disk

8 File Attributes • • • Size Protection – Who can read, write file, etc.

Time, date – When file was created, modified, accessed

9 8.2.2 Folders • • • An important attribute of folders is the Name Typically, a folder may contain Files and other Folders (commonly called sub-folders or sub directories) This results in a Tree Structure of Folder and Files.

10 Folder/Directory Tree Structure

11 • • • • 8.2.3 Pathnames The pathname of a file specifies the sequence of folders one must traverse to travel down the tree to the file. This pathname actually describes the absolute path of the file, which is the sequence of folders one must travel from the root of the tree to the desired file. A relative path describes the sequence of Folders one must traverse starting at some intermediate place on the absolute path. The Absolute path provides a unique identification for a file. Two different files can have the same filename as long as the resulting pathnames are unique.

12 File Links • Allow a directory entry to point to a file (or entry) that is not directly below it in the tree structure – Unix: Symbolic Link – Windows: Shortcut

13 Link in Directory Tree Structure

14 8.3 Access Methods • • An access method describes the manner and mechanisms by which a process accesses the data in a file. There are two common access methods: – Sequential – Random (or Direct)

15 File Operations When a process needs to use a file, there are a number of operations it can perform: • • • • Open Close Read Write

16 Create File • • Allocate space for file Make entry for file in the Directory

17 8.3.1 Open File • • • Make files accessible for read/write operations Locates files in the Directory Returns internal ID for the file – Commonly called a Handle – handle = open(filename, parameters)

18 File Open

19 8.3.2 Close File • Makes file no longer accessible from application – Deletes the Handle created by Open

20 File Close

21 8.3.3 Read File • System call specifies: – Handle from Open call – Memory Location, length of information to be read – Possibly, location in the file where data is to be read from – read(file handle, buffer) – read(file handle, buffer, length)

22 Read File • • • Uses Handle to locate file on disk Uses file’s Read Pointer to determine the position in the file to read from Update file’s Read Pointer

23 8.3.4 Write File • System call specifies: – Handle from Open call – Location, length of information to be written – Possibly, location in the file where data is to be written – write(file handle,buffer,length)

24 Write File • • • Use Handle to locate file on disk Use file’s Write pointer to determine the position in the file to write to Update file’s Write Pointer

25 Delete File • • Deletes entry for file in Directory De-allocates disk space used by the file

26 8.3.5 Sequential Access • • If the process has opened a file for sequential access, the File Management subsystem will keep track of the current file position for reading and writing. To carry this out, the system will maintain a file pointer that will be the position of the next read or write.

27 File Pointer The value of the file pointer will be initialized during Open to one of two possible values – Normally, this value will be set to 0 to start the reading or writing at the beginning of the file.

– If the file is being opened to append data to the file, the File Position pointer will be set to the current size of the file.

– After each read or write, the File Position Pointer will be incremented by the amount of data that was read or written.

28 8.3.6 Streams, Pipes, and I/O Redirection • • A Stream is the flow of data bytes, one byte after another, into the process (for reading) and out of the process (for writing). This concept applies to Sequential Access and was originally invented for network I/O, but several modern programming environments (e.g. Java, C#) have also incorporated it.

29 Standard I/O • • Standard Input – Defaults to keyboard Standard Output – Defaults to console

30 I/O Redirection • • • Standard Input can come from a file – app.exe < def.txt

Standard Output can go to a file – App.exe > def.txt

Standard Output from one application can be Standard Input for another – App1.exe | app2.exe

Called a Pipe

31 A Pipe

32 • • • • Pipe A Pipe is a connection that is dynamically established between two processes. When a process reads data, the data will come from another process rather than a file. Thus, a pipe has a process at one end that is writing to the pipe and another process reading data at the other end of the pipe. It is often the situation that one process will produce output that another process needs for input. Rather than having the first process write to a file and the second process read that file, we can save time by having each process communicate via a pipe.

Pipe and Performance • • Using a pipe can improve system performance in two ways: By not using a file, the applications save time by not using disk I/O. A pipe has the characteristic that the receiving process can read whatever data has already been written. Thus we do not need to wait until the first process has written all of the data before we start executing the second process. This creates a pipeline similar to an automobile assembly line to speed up overall performance.

33

34 8.4 Directory Functions • • • • • • Search for a file Create a file Delete a file List a directory Rename a file Traverse the file system

35 8.5 File Space Allocation • Contiguous – File is allocated contiguous disk space

File System Implementation A possible file system layout A

Master Boot Record

(

MBR

) is a special type of boot sector at the very beginning of partitioned computer mass storage devices. The MBR holds the information on how the logical partitions, containing file systems, are organized on that medium. 36

Implementing Files (1) (a) Contiguous allocation of disk space for 7 files (b) State of the disk after files D and E have been removed 37

38 Contiguous Allocation • • Advantages – Simple to implement – Good disk I/O performance Disadvantages – Need to know max file size ahead of time – Probably will waste disk space – Necessary space may not be available

39 Contiguous Allocation Read/Write Disk Address Calculation

40 8.5.1 Cluster Allocation • Cluster Allocation – Disk space allocated in blocks – Space allocated as needed

41 Cluster Allocation

Implementing Files (3) Linked list allocation using a file allocation table in RAM 42

Implementing Files (4) An example i-node 43

44 Cluster Allocation • • Advantages – Tends not to waste disk space Disadvantages – Additional overhead to keep track of clusters – Can cause poor disk I/O performance – May limit maximum size of File System

45 Cluster Performance • Clusters tend to be scattered around the disk – This is called External Fragmentation – Can cause poor performance as disk arm needs to move a lot – Requires De-fragmentation utility

46 Cluster Performance • Large clusters can reduce External Fragmentation – If lots of small files, then space will be wasted inside each cluster • This is called Internal Fragmentation

47 Managing Cluster Allocation • • Linked – Each cluster has a pointer to the next cluster Indexed – Single table has pointers to each of the clusters

48 Linked Blocks

49 Index Block

8.6 Real-World Systems 50

51 8.6 Real-World Systems • • • • Microsoft FAT Microsoft NTFS Linux Ext2, Ext3 Others

52 8.6.1 MS FAT System • • Fat16 (FAT: file allocation table ) – MS-Dos, Windows 95 – Max 2GB space for a FileSystem – Generally bad disk fragmentation Fat32 – Windows 98 – Supported by Windows 2000, XP, 2003

The MS-DOS File System (1) The MS-DOS directory entry 53

Bytes The Windows 98 File System (1) The extended MOS-DOS directory entry used in Windows 98 54

Cluster Sizes of FAT16 and FAT32 Drive Size

260 MB–511 MB 512 MB–1,023 MB 1,024 MB–2 GB 2 GB–8 GB 8 GB–16 GB 16 GB–32 GB > 32 GB

Default FAT16 Cluster Size

8 KB 16 KB 32 KB Not supported Not supported Not supported Not supported

Default FAT32 Cluster Size

Not supported 4 KB 4 KB 4 KB 8 KB 16 KB 32 KB 55

56 Windows FAT Table

57 8.6.2. Windows NTFS File System • • • • • The NTFS file system (New Technology File System) is based on a structure called the "master file table" or MFT, which is able to hold detailed information on files. This system allows the use of long names, but, unlike the FAT32 system, it is case-sensitive, which means that is capable of distinguishing lower-case and upper-case letters. Available on Windows 2000, XP, 2003 Maintains transaction log to recover after reboot Support for file protection Large (64 bit) cluster pointers – Allows small clusters – Avoids internal fragmentation

58 Windows NTFS File System

Master File Table

: containing records about the files and directories of the partition. The first record, called a descriptor, contains information on the MFT (a copy of it is stored in the second record). The third record contains the log file, a file containing all actions performed on the partition. The following records, making up what is known as the core, reference each file and directory of the partition in the form of objects with assigned attributes.

File System Structure (1) The NTFS master file table 59

File System Structure (2) The attributes used in MFT records 60

File System Structure (3) An MFT record for a three-run, nine-block file 61

62 8.6.3 Linux Ext2 and Ext3 File System

Ext2

•Ext2 stands for second extended file system.

•It was introduced in 1993. Developed by Rémy Card.

•Maximum individual file size can be from 16 GB to 2 TB

Ext3

•Ext3 stands for third extended file system.

•It was introduced in 2001. Developed by Stephen Tweedie.

•Starting from Linux Kernel 2.4.15 ext3 was available.

•Maximum individual file size can be from 16 GB to 2 TB

UNIX File System (1) Disk layout in classical UNIX systems 63

UNIX File System (3) The relation between the file descriptor table, the open file description 64

UNIX File System (2) Directory entry fields.

Structure of the i-node 65

The Linux File System Layout of the Linux Ex2 file system.

66