Transcript Chapter11

Operating Systems:
Internals and Design Principles, 6/E
William Stallings
Chapter 11
I/O Management and Disk
Scheduling
Patricia Roy
Manatee Community College, Venice, FL
©2008, Prentice Hall
GIVEN CREDIT WHERE IT IS DUE


Some of the lecture notes are borrowed from Dr.
Mark Temte at Indiana University – Purdue
University Fort Wayne and from Dr. Einar
Vollset at Cornell University
I have modified them and added new slides
I/O MANAGEMENT AND DISK SCHEDULING

We have already discussed I/O techniques




Programmed I/O
Interrupt-driven I/O
Direct memory access (DMA)
Also mentioned logical I/O where the OS hides most
of the details of device I/O in system service
routines. Then processes see devices in general
terms such as read, write, open, close, lock, unlock
3
OTHER ISSUES . . .
I/O buffering
 Physical disk organization
 Need for efficient disk access
 Disk scheduling policies
 RAID

4
I/O BUFFERING
Can be block-oriented or stream-oriented
 Block-oriented buffering

Information is stored in fixed-size blocks
 Transfers are made a block at a time
 Used for disks and tapes

5
I/O BUFFERING

Stream-oriented


Transfer information as a stream of bytes
Used for terminals, printers, communication ports,
mouse and other pointing devices, and most other
devices that are not secondary storage
6
I/O BUFFERING

I/O is a problem under paged virtual memory
The target page of any I/O operation must be present in a
page frame until the transfer is complete
 Otherwise, there can be single-process deadlock


Example



Suppose process P is blocked waiting for I/O event to
complete
Then suppose the target page for the I/O is swapped out
to disk
The I/O operation is subsequently blocked waiting for the
target page to be swapped in, but the rules of suspension
dictate that it stays suspended on the disk until the I/O
event completes!
7
I/O BUFFERING

Resulting OS complications
The target page of any I/O operation must be locked in
memory until the transfer is complete
 A process with pending I/O on any page may not be
suspended


Solution: Do I/O through a system I/O buffer in main
memory assigned to the I/O request
System buffer is locked in memory frame
 Input transfer is made to the buffer
 Block moved to user space when needed


This decouples the I/O transfer from the address space
of the application
8
Physical Disk Organization
sectors
read-write
head
track
9
Physical Disk Organization
boom
cylinder
10
PHYSICAL DISK ORGANIZATION
When the disk drive is operating, the disk is rotating
at constant speed
 To read or write, the disk head must be positioned on
the desired track and at the beginning of the desired
sector
 Once the head is in position, the read or write
operation is then performed as the sector moves
under the head
 Seek time is the time it takes to position the head on
the desired track
 Rotational delay or rotational latency is the
additional time it takes for the beginning of the sector
to reach the head once the head is in position
 Transfer time is the time for the sector to pass
under the head

11
PHYSICAL DISK ORGANIZATION
Access time
= seek time + rotational latency + transfer time
 Efficiency of a sequence of disk accesses strongly
depends on the order of the requests
 Adjacent requests on the same track avoid
additional seek and rotational latency times
 Loading a file as a unit is efficient when the file has
been stored on consecutive sectors on the same
track/cylinder of the disk

12
TRANSFER TIME

Depend on the rotation speed of the disk:
T = b/(rN)
 Where T: transfer time

b: number of bytes to be transferred
N: number of bytes on a track/cylinder
r: rotation speed, in revolutions per second

Thus, the total average access time is Ts + 0.5/r +
b/(rN), where Ts is the average seek time.
A TIME COMPARISON

Consider a disk with an advertised average seek time
of 4ms, rotation speed of 7500 rpm, and 512-byte
sectors with 500 sectors per track. Suppose that we
wish to read a file consisting of 2500 sectors for a total
of 1.28 Mbytes. We would like to estimate the total
time for the transfer.
 1) Sequential organization: the file is stored on 5
adjacent tracks (5 tracks * 500 sectors/track = 2500
sectors).
 2) The sectors are distributed randomly over the
disk.
DISK SCHEDULING POLICIES
Seek time is the reason for differences in performance
 For a single disk there will be a number of I/O
requests
 If requests are selected randomly, get poor
performance

DISK SCHEDULING POLICIES

First-in, first-out (FIFO)
Process request sequentially
 Fair to all processes
 Approaches random scheduling in performance if
there are many processes
 Example: 55, 58, 39, 18, 90, 160, 150, 38, 184

DISK SCHEDULING POLICIES

Shortest Service/Seek Time First




Select the disk I/O request that requires the least
movement of the disk arm from its current position
Always choose the minimum seek time
Example: 55, 58, 39, 18, 90, 160, 150, 38, 184
Requests for tracks far away from the current
position may never be served, if requests for closer
tracks are issued continuously
DISK SCHEDULING POLICIES

SCAN (aka Elevator Algorithm)
Arm moves in one direction only, satisfying all
outstanding requests until it reaches the last track in
that direction
 Direction is reversed
 Example: 55, 58, 39, 18, 90, 160, 150, 38, 184

DISK SCHEDULING POLICIES

C-SCAN
Restricts scanning to one direction only
 When the last track has been visited in one direction,
the arm is returned to the opposite end of the disk
and the scan begins again
 In case of repetitive requests to one track, we will see
“arm stickiness” in SSTF, SCAN, C-SCAN

DISK SCHEDULING POLICIES

N-step-SCAN
Segment the disk request queue into subqueues of
length N
 Subqueues are processed one at a time, using SCAN
 New requests added to other queue when a queue is
processed

DISK SCHEDULING POLICIES

FSCAN
Two subqueues
 One queue is empty for new requests

Disk Scheduling Algorithms
IN-CLASS EXERCISE

Prob. 11.3
a) Perform the same type of analysis as that of the
previous table for the following sequence of disk track
requests: 27, 129, 110, 186, 147, 41, 10, 64, 120.
Assume that the disk head is initially positioned over
track 100 and is moving in the direction of decreasing
track number.
 b) Do the same analysis, but now assume that the
disk head is moving in the direction of the increasing
track number.

RAID MOTIVATION

Disks are improving, but not as fast as CPUs
1970s seek time: 50-100 ms.
 2000s seek time: <5 ms.
 Factor of 20 improvement in 3 decades





We can use multiple disks for improving performance
 By striping files across multiple disks (placing parts of each file
on a different disk), parallel I/O can improve access time
Striping reduces reliability

What’s the mean time between failures of 100 disks, assuming T as the mean
time between failures of one disk?

The mean time between failures of 100 disks = 1/100 times of the mean
time between failures of one disk
So, we need striping for performance, but we need something to help
with reliability
To improve reliability, we can add redundant data to the disks, in
addition to striping
RAID

A RAID is a Redundant Array of Inexpensive Disks






“I” is also for “Independent”
The alternative is SLED, single large expensive disk
Disks are small and cheap, so it’s easy to put lots of disks
(10s to 100s) in one box for increased storage, performance,
and availability
The RAID box with a RAID controller looks just like a
SLED to the computer
Data plus some redundant information is striped across the
disks in some way
How that striping is done is key to performance and
reliability.
RAID LEVEL 0





Level 0 is nonredundant disk array
Files are striped across disks, no redundant info
High read throughput
Best write throughput (no redundant info to write)
Any disk failure results in data loss

Reliability worse than SLED
Stripe 0
Stripe 1
Stripe 2
Stripe 4
Stripe 5
Stripe 6
Stripe 7
Stripe 8
Stripe 9
Stripe 10
Stripe 11
data disks
Stripe 3
RAID LEVEL 1


Mirrored Disks
Data is written to two places


On read, choose fastest to read


On failure, just use surviving disk
Write performance is same as single drive, read performance
is 2x better
Replication redundancy is expensive
Stripe 0
Stripe 1
Stripe 4
Stripe 5
Stripe 8
Stripe 9
Stripe 2
Stripe 3
Stripe 0
Stripe 1
Stripe 6
Stripe 7
Stripe 4
Stripe 5
Stripe 6
Stripe 7
Stripe 10
Stripe 11
Stripe 8
Stripe 9
Stripe 10
Stripe 11
data disks
Stripe 2
mirror copies
Stripe 3
PARITY AND HAMMING CODES

What do you need to do in order to detect and correct a one-bit
error ?



Suppose you have a binary number, represented as a collection of
bits: <b3, b2, b1, b0>, e.g. 0110
Detection is easy
Parity:

Count the number of bits that are on, see if it’s odd or even






EVEN parity is 0 if the number of 1 bits is even
Parity(<b3, b2, b1, b0 >) = P0 = b0  b1  b2  b3
Parity(<b3, b2, b1, b0, p0>) = 0 if all bits are intact
Parity(0110) = 0, Parity(01100) = 0
Parity(11100) = 1 => ERROR!
Parity can detect a single error, but can’t tell you which of the
bits got flipped
HAMMING CODE HISTORY

In the late 1940’s Richard Hamming recognized
that the further evolution of computers required
greater reliability, in particular the ability to not
only detect errors, but correct them. His search
for error-correcting codes led to the Hamming
Codes, perfect 1-error correcting codes, and the
extended Hamming Codes, 1-error correcting and
2-error detecting codes.
PARITY AND HAMMING CODES

Detection and correction require more work

Hamming codes can detect & correct single bit errors

[7,4] binary Hamming Code












h0 = b0  b1  b3
h1 = b0  b2  b3
h2 = b1  b2  b3
H0(<1101>) = 0
H1(<1101>) = 1
H2(<1101>) = 0
Hamming(<1101>) = <b3, b2, b1, h2, b0, h1, h0> = <1100110>
If a bit is flipped, e.g. <1110110>
a = h0  b0  b1  b3 = 1
b = h1  b0  b2  b3 = 0
c = h2  b1  b2  b3 = 1
abc = <101>, the 5th bit is in error and switch it
EXTENDED [8,4] BINARY HAMM.
CODE
As with the [7,4] binary Hamming Code:
 h0 = b0  b1  b3
 h1 = b0  b2  b3
 h2 = b1  b2  b3
 Add a new bit p such that

p = b0  b1  b2  b3  h0  h1  h2 . i.e., the new bit
makes the XOR of all 8 bits zero. p is called a parity check
bit.
 Assume at most 2 errors:

If parity check passes and abc = 000, the system is correct;
 If parity check passes and abc ≠ 000, the system has two errors;
 If parity check fails, there is one error and abc indicates which bit
is in error.

RAID LEVEL 2

Bit-level striping with Hamming (ECC) codes for error correction

All 7 disk arms are synchronized and move in unison

Complicated controller

Single access at a time

Tolerates only one error
Bit 0
Bit 1
Bit 2
data disks
Bit 3
Bit 4
Bit 5
ECC disks
Bit 6
RAID LEVEL 3

Use a parity disk




Each bit on the parity disk is a parity function of the
corresponding bits on all the other disks
A read accesses all the data disks
A write accesses all data disks plus the parity disk
On a disk failure, read remaining disks plus parity disk to
compute the missing data
Bit 0
Bit 1
Bit 2
Bit 3
Parity
Parity disk
data disks
RAID 3
X4(i) = X3(i)  X2(i)  X1(i)  X0(i)
If drive X1 has failed. How to recover it?
RAID LEVEL 4



Combines Level 0 and 3 – block-level parity with Stripes
A read accesses all the data disks
A write accesses all data disks plus the parity disk
Stripe 0
Stripe 1
Stripe 2
Stripe 3
P0-3
Stripe 4
Stripe 5
Stripe 6
Stripe 7
P4-7
Stripe 8
Stripe 9
Stripe 10
Stripe 11
P8-11
Parity disk
data disks
RAID 4 (BLOCK-LEVEL PARITY)
X4(i) = X3(i)  X2(i)  X1(i)  X0(i)
How to execute a write operation, for instance on
drive X1?
Heavy load on the parity disk
RAID 5 (block-level distributed
parity)
RAID 6 (dual redundancy)
• Level 5 with an extra parity bit, which is generated with a different and
independent data check algorithm
• Can tolerate two failures
I/O BUFFERING FOR THROUGHPUT
Perform input transfers in advance of requests being
made
 With I/O buffers, an application can process the data
from one I/O request while awaiting another
 Time needed to process a block of data . . .
without buffering = C + T
with buffering = M + max{ C, T }
where:

C = computation time
T = I/O memory/disk transfer time
M = memory/memory transfer time (buffer to user)

Perform output transfers sometime after the request
is made
40
DOUBLE BUFFERING
Use two system buffers instead of one
 A process can transfer data to or from one buffer
while the operating system empties or fills the
other buffer

41
CIRCULAR BUFFERING
More than two buffers are used
 List of system buffers are arranged in a circle
 Appropriate in applications where there are bursts of
I/O requests

42
SOME RAID ISSUES


Granularity

fine-grained: Stripe each file over all disks. This gives high
throughput for the file, but limits to transfer of 1 file at a time

coarse-grained: stripe each file over only a few disks. This
limits throughput for 1 file but allows more parallel file access
Redundancy

concentrate redundancy info on a small number of disks:
partition the set into data disks and redundant disks

uniformly distribute redundancy info on disks: avoids loadbalancing problems