Project Goal - National Tsing Hua University

Transcript Project Goal - National Tsing Hua University

Operating System
Allen C.-H. Wu
Department of Computer Science
Tsing Hua University
1
Part I: Overview
Ch. 1 Introduction
• Operating system: is a program that acts as an
intermediary between a user and computer
hardware. The goals are to make the computer
system convenient to use and run in an efficient
manner.
• Why, what and how?
• DOS, Window, UNIX, Linux
• Single-user, multi-user
2
1.1 What Is an Operating System
User
System and
application
programs
Operating
system
Hardware
User
• OS=government: resource allocation=> CPU,
memory, IO, storage
• OS: a control program controls the execution of
user programs to prevent errors and improper use
of the computer.
• Convenience for the user and efficient operation of
the computer system
3
1.2 Batch Systems
• In early days (beyond PC era), computers were
extremely expensive. Only few institutes can
afford it.
• The common IO devices include card readers, tape
drives, and line printers.
• To speed up processing, operators batched
together jobs with similar needs and ran them
through the computer as a group.
• The OS is simple that needs to only automatically
transfer control from one job to the next.
4
Batch Systems
• Speed(CPU) >> speed(IO: card readers) => CPU
is constantly idle.
• After introduce disk technology, OS can keep all
jobs on a disk instead of a serial card reader. OS
can perform job scheduling (Ch. 6) to perform
tasks more efficiently.
• Multiprogramming: OS keeps several jobs in the
memory simultaneously. Interleaving CPU and IO
operations between different jobs to maximize the
CPU utilization.
5
Batch Systems
• Life examples: a lawyer handles multiple cases for
many clients.
• Multiprogramming is the first instance where OS
must make decisions for the users: job scheduling
and CPU scheduling.
6
1.3 Time-Sharing Systems
• Time sharing or multitasking: the CPU executes
multiple jobs by switching among them, but
switches are so quick and so frequently that the
users can interact with each program while it is
running (the user thinks that he/she is the only
user).
• A time-sharing OS uses CPU scheduling and
multiprogramming to provide each user with a
small portion of a time-shared computer.
• Process: a program is loaded into memory and
executed.
7
Time-Sharing Systems
• Need memory management and protection
methods (Ch. 9)
• Virtual memory (Ch. 10)
• File systems (Ch. 11)
• Disk management (Ch. 13)
• CPU scheduling (Ch. 6)
• Synchronization and communication (Ch. 7)
8
1.4 PC Systems
• MS-DOS, Microsoft-Window, Linux, IBM OS/2,
Macintosh OS
• Mainframe (MULTICS:MIT) => minicomputers
(DEC:VMS, Bell-Lab:UNIX) => microcomputers
=> network computers
• Personal workstation: a large PC (SUN, HP, IBM:
Windows NY, UNIX)
• PCs are mainly single-user systems: no resource
sharing is needed; due to the internet access,
security and protection is needed
9
1.5 Parallel Systems
• Multiprocessor systems: tightly coupled systems
• Why? 1) improve throughput, 2) money saving
due to resources sharing (peripherals, storage, and
power), and 3) increase reliability (graceful
degradation, fault tolerant)
• Symmetric multiprocessing: each processor runs
an identical OS, needs communication between
processors
• Asymmetric multiprocessing: one master control
processor, master-slave
10
Parallel Systems
• Back-ends
• => microprocessors become inexpensive
• => using additional microprocessors to off-load
some OS functions (e.g., using a microprocessor
system to control disk management)
• a kind of master-salve multiprocessing
11
1.6 Real-Time Systems
• There are rigid time requirements on the operation
of a processor or control/data flow
• Hard real-time systems: the critical tasks must be
guaranteed to be completed on time
• Soft real-time systems: a critical real-time task
gets priority over other tasks
12
1.7 Distributed Systems
•
•
•
•
Internet and WWW
TCP/IP and PPP
Distributed systems: loosely coupled systems
Network OS: a PC running a network OS acts
autonomously (Ch. 14)
• Distributed OS: a less autonomous environment
(Ch. 16)
13
Ch. 2 Computer-System
Structures
CPU
Disks
Printers
Tape drivers
Disk
controller
Printer
controller
Tape-drive
controller
Memory
controller
System bus
Memory
14
2.1 Computer-System Operation
• Bootstrap program
• Modern OSs are interrupt driven
• Interrupt vector: interrupted device address,
interrupt request, and other info
• System call (e.g., performing an I/O operation)
• Trap
15
2.2 I/O Structure
• SCSI (small computer-systems interface): can
attach seven or more devices
• Synchronous I/O: I/O requested => I/O started =>
I/O completed => returned control to user program
• Asynchronous I/O: I/O requested => I/O started
=> returned control to user program without
waiting the completion of the I/O operation
• Device-status table: indicates the device’s type,
address, and state (busy, idle, not functioning)
16
I/O Structure
• DMA (Direct Memory Access)
• Data transfer for high-speed I/O devices and main
memory
• Block transfer with one interrupt (without CPU
intervention: 1 byte/word at a time)
• Cycle-stealing
• A back-end microprocessor?
17
2.3 Storage Structure
•
•
•
•
•
•
Main memory: RAM (SRAM and DRAM)
von Neumann architecture: instruction register
Memory-mapped I/O, programmed I/O (PIO)
Secondary memory
Magnetic disks, floppy disks
Magnetic tapes
18
2.4 Storage Hierarchy
(FIG)
• Bridging speed gap
• registers=>cache=>main memory=>electronic
disk=>magnetic disk=>optical disk=>magnetic
tapes
• Volatile storage: data lost when power is off
• Nonvolatile storage: storage systems below
electronic disk are nonvolatile
• Cache: small size but fast (cache management: hit
and miss)
• Coherency and consistency
19
2.5 Hardware Protection
• Resource sharing (multiprogramming) improves
utilization but also increase problems
• Many programming errors are detected by the
hardware and reported to OS (e.g., memory fault)
• Dual-mode operation: user mode and monitor
mode (also called supervisor, system or privileged
mode: privileged instructions): indicated by a
mode bit.
• Whenever a trap occurs, the hardware switches
from user mode to monitor mode
20
Hardware Protection
• I/O protection: all I/O instructions should be
privileged instructions. The user can only perform
I/O operation through the OS.
• Memory protection: protect the OS from access by
users program, protect user programs from each
other: base and limit registers.
• CPU protection: A timer to prevent a user program
from getting stuck in an infinite loop.
21
CH. 3 OS Structure
• Examining the services that an OS provides
• Examining the interface between the OS and users
• Disassembling the system into components and
their interconnections
• OS components:
=> Process management
=> Main-memory management
=> File management
=> I/O-system management
=> Secondary-storage management
=> Networking
=> Protection system
=> Command-interpreter
22
3.1 System Components
Process Management
• Process: a program in execution (e.g., a compiler,
a word-processing program)
• A process needs certain resources (e.g., CPU,
memory, files and I/O devices) to complete its
task. When the process terminates, the OS will
reclaim any reusable resources.
• OS processes and user processes: The execution of
each process must be sequential. All the processes
can potentially execute concurrently, by
multiplexing the CPU among them.
23
Process Management
The OS should perform the following tasks:
• Creating and deleting processes
• Suspending and resuming processes
• Providing mechanisms for process
synchronization
• Providing mechanisms for process communication
• Providing mechanisms for deadlock handling
• => Ch. 4- Ch. 7
24
Main-Memory Management
• Main memory is a repository of quickly accessible
data shared by the CPU and I/O devices (Store
data as well as program)
• Using absolute address to access data in the main
memory
• Each memory-management scheme requires its
own hardware support
• The OS should responsible for the following tasks:
=> Tracking what parts memory are currently used and by whom
=> Deciding which processes should be loaded into memory
=> Allocating and deallocating memory as needed
25
File Management
• Different I/O devices have different characteristics
(e.g., access speed, capacity, access method) physical properties
• File: is a collection of related information defined
by its creator. The OS provides a logical view of
information storage (FILE) regardless its physical
properties
• Directories => files (organizer) => access right for
multiple users
26
File Management
The OS should be responsible for:
• Creating and deleting files
• Creating and deleting directories
• Supporting primitives for manipulating files and
directories
• Mapping files onto secondary storage
• Backing up files on nonvolatile storage
• => Ch. 11
27
I/O-System Management
• An OS should hide the peculiarities of specific
hardware devices from the user
• The I/O subsystem consists of:
• A memory-management component including
buffering, caching, and spooling
• A general device-driver interface
• Drivers for specific hardware devices
28
Secondary-Storage Management
• Most modern computer systems use disks as the
principle on-line storage medium, for both
programs and data
• Most programs stored on a disk and will be loaded
into main memory whenever it is needed
• The OS should responsible for:
=> Free-space management
=> Storage allocation
=> Disk scheduling
=> Ch. 13
29
Networking
• Distributed system: a collection of independent
processors that are connected through a
communication network
• FTP: file transfer protocol
• WWW: NFS (network file system protocol)
• http:
• => Ch. 14- Ch. 17
30
Protection System
• For a multi-user/multi-process system: processes
executions need to be protected
• Any mechanisms for controlling the access of
programs, data, and resources
• Authorized and unauthorized access and usage
31
Command-Interpreter System
• OS (kernel) <=> command interpreter (shell) <=>
user
• Control statements
• A mouse-based window OS:
• Click an icon, depending on mouse point’s
location, the OS can invoke a program, select a
file or a directory (folder).
32
3.2 OS Services
•
•
•
•
•
•
•
•
Program execution
I/O operation
File-system manipulation
Communications
Error detection
Resource allocation
Accounting
Protection
33
3.3 System Calls
• System calls: the interface between a process and
the OS
• Mainly in assembly-language instructions.
• Allow to be invoked from a higher-level language
program (C, C++ for UNIX: JAVA+C/C++)
• EX. Copy one file to another: how to use system
calls to perform this task?
• Three common ways to pass parameters to the OS:
register, block, stack (push/pop).
34
System Calls
Five major categories:
• Process control
• File manipulation
• Device manipulation
• Information maintenance
• Communications
35
Process Control
• End, about:
=>Halt the execution normally (end) or abnormally
(abort)
=> Core dump file: debugger
=>Error level and possible recovery
• Load, execute
=> When to load/execute? Where to return the
control after it’s done?
• Create/terminate process
=> When? (wait time/event)
36
Process Control
• Get/set process attributes
=> Core dump file for debugging
=> A time profile of a program
• Wait for time, event, single event
• Allocate and free memory
• The MS-DOS: a single tasking system
• Berkeley UNIX: a multitasking system (using fork
to start a new process
37
File Management
•
•
•
•
Create/delete file
Open, close
Read, write, reposition (e.g., to the end of the file)
Get/set file attributes
38
Device Management
•
•
•
•
Request/release device
Read, write, reposition
Get/set device attributes
Logically attach and detach devices
39
Information Maintenance
• Get/set time or date
• Get/set system data (e.g., OS version, free
memory space)
• Get/set process, file, or device attributes (e.g.,
current users and processes)
40
Communications
• Create, delete communication connection:
message-passing and shared-memory model
• Send, receive messages: host name (IP name),
process name
Daemons: source (client)<->connection<->the
receiving daemon (server)
• Transfer status information
• Attach or detach remote devices
41
3.4 System Programs
• OS: a collection of system programs include file
management, status information, file modification,
programming-language support, program loading
and execution, and communications.
• Os is supplied with system utilities or application
programs (e.g., web browsers, compiler, wordprocessors)
• Command interpreter: the most important system
program
=> contains code to execute the command
=> UNIX: command -> to a file, load the file into memory and execute
rm G => search the file rm => load the file => execute it with the
parameter G
42
3.5 System Structure
(Simple Structure)
FIG3.6
• MS-DOS: application programs are able to
directly access the basic I/O routine (8088 has no
dual mode and no hardware protection) => errant
programs may cause entire system crashes
• UNIX: the kernel and the system programs.
• System calls define the application programmer
interface (API) to UNIX
FIG3.7
43
Layered Approach
• Layer 0 (the bottom one): the hardware, layer N
(the top one): the user interface
• The main advantage of the layer approach:
modularity
Pro: simplify the design and implementation
Con: not easy to appropriately define the layers
less efficient
• Windows NT: a highly layer-oriented organization
=> lower performance compared to Windows 95
=> Windows NT 4.0 => moving layers from user
space to kernel space to improve the performance
44
Microkernels
• Carnegie Mellon Univ (1980s): Mach
• Idea: removing all nonessential components from
the kernel, and implementing them as system and
user-level programs.
• Main function: microkernel provides a
communication facility (message passing)
between the client program and various services
(running in user space)
• Easy of extending the OS: new services are added
to the user space, no change on the kernel
45
Microkernels
• Easy to port, more security and reliability (most
services are running as user, if a service fails, the
rest of OS remains ok)
• Digital UNIX
• Apple MacOS Server OS
• Windows NT: a hybrid structure
FIG 3.10
46
Virtual Machines
• VM: IBM FIG 3.11
• Each process is provided with a (virtual) copy of
the underlying computer
• Major difficulty: disk systems => minidisks
Implementation:
• Difficult to implement: switch between a virtual
user and a virtual monitor mode
• Less efficient in run time
47
Virtual Machines
Benefits:
• The environment is complete protection of the
various system resources (but no direct sharing of
resources)
• A perfect vehicle for OS research and development
• No system-development time is needed: system
programmer can work on his/her own virtual
machine to develop their system
• MS-DOS (Intel) <=> UNIX (SUN)
• Apple Macintosh (68000) <=> Mac (old 68000)
• Java
48
Java
• Java: a technology rather than a programming
language : SUN : late 1995
• Three essential components:
=> Programming-language specification
=> Application-programming interface (API)
=> Virtual-machine specification
49
Java
Programming language
• Object-oriented, architecture-neutral, distributed
and multithreaded programming language
• Applets: programs with limited resource access
that run within a web browser
• A secure language (running on distributed
network)
• Performing automate garbage collection
50
Java
API
• Basic language: support for graphics, I/O, utilities
and networking
• Extended language: support for enterprise,
commerce, security and media
Virtual machine
• JVM: a class loader and a Java interpreter
• Just-in-time compiler: turns the architectureneutral bytecodes into native machine language
for the host computer
FIG3.12
51
Java
FIG 3.13
• The Java platforms: JVM and Java API => make it
possible to develop programs that are architecture
neutral and portable
• Java development environment: a compile-time
and a run-time environment
FIG 3.15
52
3.8 System Design and
Implementation
• Define the goals and specification
• User goals (wish list) and system goals
(implementation concerns)
• The separation of policy (what should be done)
and mechanism (how to do it)
• Microkernel: implementing a basic set of policyfree primitive building blocks
• Traditionally, OS is implemented using assembly
language (better performance but portable is the
problem)
53
System Design and
Implementation
High-level language implementation
• Easy porting but slow speed with more storage
• Need better data structures and algorithms
• MULTICS (ALGOL); UNIX, OS/2, Windows (C)
• Non critical (HLL), critical (assembly language)
• System generation (SYSGEN): to create an OS for
a particular machine configuration (e.g., CPU?
Memory? Devices? Options?)
54
Part II: Process Management
Ch. 4 Processes
4.1 Process Concept
• Process (job) is a program in execution
• Ex. For a single-user system (PC), the user can run
multiple processes (jobs), such as web, wordprocessor, and CD-player, simultaneously
• Two processes may be associated with the same
program. Ex. You can invoke an editor twice to
edit two files (two processes) simultaneously
55
Process Concept
Process state:
• Each process may be in one of the 5 states: new,
running, waiting, ready, and terminated
Terminated
New
admitted
interrupt
Ready
IO or event
completion
exit
Running
Scheduler
dispatch
IO or event wait
Waiting
56
Process Concept
FIG 4.2
Process Control Block (PCB): represents a process
• Process state: new, ready, running, waiting or exit
• Program counter: point to the next instruction to
be executed for the process
• CPU registers: when an interrupt occurs, the data
needs to be stored to allow the process to be
continued correctly
• CPU-scheduling information: process priority
(Ch.6)
• Memory-management information: the values of
base and limit registers, the page tables...
57
Process Concept
• Accounting information: account number, process
number, time limits…
• IO status information: a list of IO devices
allocated to the process, a list of open files….
Threads
FIG 4.3
• Single thread: a process is executed with one
control/data flow
• Multi-thread: a process is executed with multiple
control/data flow (e.g., running an editor, a
process can execute “type in” and spelling check
at the same time
58
4.2 Process Scheduling
• The objective of multiprogramming: maximize the
CPU utilization (keep the CPU running all the
time)
Scheduling queues
• Ready queue (usually a linked list): the processes
that are in the main memory and ready to be
executed
• Device queue: the list of processes waiting for a
particular IO device
FIG 4.4
59
Process Scheduling
• Queuing diagram
Ready queue
IO
IO queue
CPU
IO request
Time slice expired
Child executes
Fork a child
Interrupt occurs
Wait for an
interrupt
60
Process Scheduling
Scheduler
• Long-term scheduler (job scheduler): selects
process from a pool and loads them into main
memory for execution (less frequent and has
longer-time to make a more careful selection
decision)
• Short-term scheduler (CPU scheduler): selects
among processes for execution (more frequent and
must fast)
• The long-term scheduler controls the degree of
multiprogramming (the # of processes in memory)
61
Process Scheduling
• IO-bound process
• CPU-bound process
• if all processes are IO-bound => ready queue
always be empty => short-term scheduler has
nothing to do
• if all processes are CPU-bound => IO-waiting
queue always be empty => devices will be unused
• Balance system performance = a good mix of IObound and CPU-bound processes
62
Process Scheduling
FIG 4.6
• The medium-term scheduler: using swapping to
improve the process mix
• Context switching: switching the CPU to a new
process => saving the state of the suspended
process AND loading the saved state for the new
process
• Context switching time is pure overhead and
heavily depended on hardware support
63
4.3 Operations on Processes
Process creation
• A process may create several new processes:
parent process => children processes (tree)
• Subprocesses may obtain resources from their
parent (it may overloading) or from the OS
When a process creates a new one, the execution
1. The parent and the new one run concurrently
2. The parent waits until all of its children have
terminated
64
4.3 Operations on Processes
In terms of the address space of the new process
1. The child process is a duplicate of the parent
process
2. The child process has a program loaded into it
• In UNIX, each process has a process identifier.
“fork” system call to create a new process (it
consists of a copy of the address space of the
original process) Advantage? Easy communication
between the parent and children processes.
65
4.3 Operations on Processes
• “execlp” system call (after “fork”): replace the
process’ memory space with a new program
Pid = fork();
if (pid<0) fork failed
else if (pid==0)
execlp(“/bin/ls”, “ls”,NULL) --- overlay with UNIX “ls”
else wait(NULL) -- wait for the child to complete
exit(0)
66
4.3 Operations on Processes
Process termination
• “exit”: system call after terminating a process
• Cascading termination: when a process terminates,
all its children must also be terminated
67
4.4 Cooperating Processes
• Independent and cooperating processes
• Any process shares data with other processes is a
cooperating process
WHY needs process cooperation?
• Information sharing
• Computation speedup (e.g., parallel execution of
CPU and IO)
• Modularity: dividing the system functions into
separate processes
68
4.4 Cooperating Processes
• Convenience: for a single-user, many tasks can be
executed at the same time
• Producer-consumer
• Unbounded/bounded-buffer
• The shared buffer: implemented as a circular array
69
4.5 Interprocess Communication
(IPC)
Message-passing system
• “send” and “receive”
• Fixed or variable size of messages
Communication link
• Direct/indirect communication
• Symmetric/asymmetric communication
• Automatic or explicit buffering
• Send by copy or by reference
• Fixed or variable-sized messages
70
4.5 Interprocess Communication
(IPC)
Naming
Direct communication (two processes link)
• symmetric in addressing: send(p, message),
receive(q, message): explicit name of the recipient
and sender
• asymmetric in addressing: send(p, message),
receive(id, message): variable id is set to the name
• Disadvantage: limited modularity of the process
definition (all the old names need to be found
before it can be modified; not suitable for separate
compilation)
71
4.5 Interprocess Communication
(IPC)
Indirect communication
• using mailboxes or ports
• Supporting multi-processes link
• Mailbox may be owned by the process (when
process terminates, the mailbox disappears) or
• If the mailbox is owned by the OS that must allow
the process: creates a new mailbox, send/receive
message via the mailbox, and deletes the mailbox
72
4.5 Interprocess Communication
(IPC)
Synchronization
• Blocking/nonblocking send and receive
• Blocking (asynchronous) nonblocking
(synchronous)
• A rendezvous between the sender and receiver
when both are blocking
Buffering
• Zero/bounded/unbounded capacity
73
Mach
• Message based: using ports
• When a task is created: two mailboxes, the Kernel
(kernel communication) and the Notify
(notification of event occurrences) ports are
created
• Three systems calls are needed for message
transfer: msg_send, msd_receive, and msg_rpc
(Remote Procedure Call)
• Mailbox: initial empty queue: FIFO order
• Message: fixed-length header, variable-length data
74
Mach
• If the mailbox is full, the sender has 4 options:
1. Wait indefinitely until there is a free room
2. Wait for N ms
3. Do not wait, just return immediately
4. Temporarily cache a message
• The receiver must specify the mailbox or the
mailbox set
• The Mach was designed for distributed systems
75
Window NT
• Employs modularity to increase functionality and
decrease the implementation time for adding new
features
• NT supports multiple OS subsystems: message
passing (called local procedure-call facility (LPC))
• Using ports for communications: connection port
(by client) and communication port (by server)
• 3 types of message-passing techniques:
1. 256-byte queue
2. Large message via shared memory
3. Quick LPC (64k)
76
Ch. 5 Thread
5.1 Overview
FIG 5.1
• A lightweight process: a basic unit of CPU
utilization
• A heavyweight process: a single thread of control
• Multithread is common practice: ex. Web has 1
thread on displaying text/image and another on
retrieving data from the network
• When a single application requires to perform
several similar tasks (e.g., web server accepts
many clients’ requests), using threads is more
efficient than using processes.
77
5.2 Benefits
4 main benefits:
• Responsiveness: allowing a program to continue
running even part of it is blocked or running a
lengthy operation
• Resource sharing: memory and code
• Economy: allocating memory and resources for a
process is more expensive (in Solaris, creating a
process is 30 times slower, contex switching is 5
times slower)
• Utilization of multiprocessor architectures (for a
single-processor, the thread is running one at a
time
78
5.3 User and Kernel Threads
User thread
• by a thread library at the user level that supports
thread creation, scheduling and management with
no kernel’s support
• Advantage: fast
• Disadvantage: if a kernel is single-threaded, any
user-level thread -> blocking system calls =>
block the entire process
• POSIX Pthreads, Mach C-threads, Solaris threads
79
User and Kernel Threads
Kernel threads
• Supported by the OS
• It’s slower than user threads
• If a thread performs a block system call, the kernel
can schedule another thread in the application for
execution
• Window NT, Solaris, Digital UNIX
80
5.4 Multithreading Models
Many-to-one model: many user-level to one kernel
• only one user thread can access the kernel thread
at one time => can’t run in parallel on
multiprocessors
One-to-one model
• More concurrency (allowing parallel execution)
• Overhead: one kernel process for one user process
Many-to-many
• The # of kernel threads => specific for a particular
application or machine
• it doesn’t suffer the drawbacks of the other two 81
models
5.5 Solaris Threads
•
•
•
•
•
FIG 5.5
Till 1992 it only supports a single thread of control
Now, it supports kernel/user-level, symmetric
multiprocessing, and real-time scheduling
Intermediate-level of threads: user-level
<=>lightweight processes (LWP)<=>kernel-level
Many-to-many model
User-level threads: bounded (permanently
attached to a LWP), unbounded (multiplexed onto
the pool of available LWPs)
82
Solaris Threads
• Each LWP is connected to one kernel-level thread,
whereas each user-level thread is independent of
the kernel
83
5.6 Java Threads
• Support thread creation and management at the
language level
Thread creation
• Create a new class derived from the Thread class
• Define a class that implements the Runnable
interface
Thread management
FIG 5.10
• suspend(), sleep(), resume(), stop()
Thread states
• New, Runnable, Blocked, Dead
84
Ch. 6 CPU Scheduling
6.1 Basic Concepts
• The objective of multiprogramming: maximize the
CPU utilization
• Scheduling: the center of OS
• CPU-IO burst cycle: IO-bound program->many
short CPU bursts, CPU-bound program->few very
long CPU bursts
• CPU scheduler: short-term scheduler
• Queue: FIFO, priority, tree or a linked list
Preemptive scheduling
CPU scheduling decisions depend on:
85
Basic Concepts
1. A process from running to waiting state
2. A process from running to ready state
3. A process from waiting to ready state
4. A process terminates
• 1 and 4 occur, a new process must be selected for
execution but not necessary for 2 and 3
• The scheduling scheme only for 1 and 4 is called
nonpreemptive or cooperative (once the CPU is
allocated to a process, the process keeps the CPU
till it terminates or moves to the waiting state
86
Basic Concepts
• The preemptive scheduling scheme needs to
consider how to swap the process execution and
maintain the correct execution (Context switching)
Dispatcher: gives control of the CPU to a newly
selected process
• Switching context
• Switching to user mode
• Jump to proper location of the user program and
start it
• Dispatch latency: the time between stop the old
and start the new one
87
6.2 Scheduling Criteria
• CPU utilization
• Throughput: the # of processes completed/per
unit-time
• Turnaround time: submission of a process to its
completion
• Waiting time: the sum of the periods spend
waiting in the ready queue
• Response time: interactive system (minimize
variance of the response time is more important
than minimize the average response time)
88
6.3 Scheduling Algorithms
• Comparison the average waiting time
FCFS(first come first serve)
• Convoy effect: all other processes wait for one big
process gets off the CPU
• The FCFS scheduling algorithm is nonpreeemptive
P. 141, 142
89
Scheduling Algorithms
SJF(Shortest-job-first scheduling)
• Provably optimal
• Difficulty: how to know the length of the next
CPU burst???
• Used frequently in long-term scheduling
P. 143
90
Scheduling Algorithms
• Predict: exponential average
P. 143, 144
• Preemptive SJF: shortest-remaining-time-first
P. 145
91
Scheduling Algorithms
Priority scheduling
• Priorities can be defined internally (some
measures in time or memory size) or externally
(specify by the users)
• Either preemptive or nonpreemptive
• Problem: starvation (low-priority process will
never be executed)
• Solution: aging (increase priority over time)
P. 146
92
Scheduling Algorithms
Round-robin (RR) scheduling
• Suitable for time-sharing systems
• Time quantum: circular queue of processes
• The average waiting time is often long
• The RR scheduling algorithm is preemptive
P. 147, 148
93
Scheduling Algorithms
• Performance => size of the time quantum=>
extremely large (=FCFS) => extremely small
(processor sharing)
• Rule of thumb=> 80% of CPU bursts should be
shorter then the time quantum
• Performance => context switch effect => time
quantum > time(context switching)
• Turnaround time => size of the time quantum
94
Scheduling Algorithms
FIG 6.6
Multilevel queue scheduling
• Priority: foreground (interactive) processes >
background (batch) processes
• Partitions the ready queue into several separate
queues
• The processes are permanently assigned to a
queue based on some properties of the process
(e.g., process type, memory size…)
• Each queue has its own scheduling algorithm
• Scheduling between queues: 1) fixed-priority
preemptive scheduling, 2) time slices between
queues
95
Scheduling Algorithms
Multilevel feedback-queue scheduling
• Allow a process to move between queues
• The idea is to separate processes with different
CPU-burst characteristics (e.g., move the process
using too much CPU to a lower-priority)
• What are considerations for such decisions?
96
6.4 Multiple-Processor
Scheduling
• Homogeneous: all processors are identical
• Load sharing among processors
• Symmetric multiprocessing (SMP): each processor
is self-scheduling, it examines a common ready
queue and select a process to execute (what’re the
main concern?)
• Asymmetric multiprocessing: a master server is
handling all scheduling decisions
97
6.5 Real-Time Scheduling
• Hard real-time: resource reservation (impossible
using a secondary memory or virtual memory)
• It requires a special-purpose software running on
hardware dedicated to the critical process to
satisfy the hard real-time constraints
• Soft real-time: guarantee critical processes having
higher priorities
• The system must have priority scheduling and the
real-time processes must have the highest priority,
and will not degrade with time
• The dispatch latency must be short. HOW?
98
Real-Time Scheduling
• Preemption points in long-duration system calls
• Making the entire kernel preemptible
• What if a high-priority process needs to
read/modify kernel data which is currently used by
a low-priority process? (Priority inversion)
• Priority-inheritance protocol: the processes that
are accessing resources that the high-priority
process needs will inherit the high-priority and
continue running till they all complete
99
6.6 Thread Scheduling
• User-level threads: process local scheduling
• Kernel-level threads: system global scheduling
Solaris 2 scheduling
• Priority-based process scheduling
• 4 classes in order of priority: real-time, system,
(time sharing & interactive)
• The higher the priority, the smaller the time slices
and visa versa
100
6.7 Java Thread Scheduling
• Preemptive, priority-based scheduling
• threads are time-sliced or not is up to the
particular implementation of the JVM
• yield(): relinquishes control of the CPU
• Cooperative multitasking: a thread voluntarily
yielding control of the CPU
• Threads are given a default priority, which can be
changed explicitly by the program, the JVM does
not dynamically alter priorities
• A time-sliced, round-robin thread scheduler
101
6.8 Algorithm Evaluation
• Deterministic modeling: analytic evaluation (given
predetermined workloads and based on that to
define the performance of each algorithm)
• Queueing models: limit theoretical analysis
• Simulations: random-number generator, it may be
inaccurate due to assumed distribution (defined
empirically or mathematically). Solution: trace
tapes (monitoring the real system)
• Implementation: most accurate but with high cost.
P. 162, 163
102
Ch. 7 Process Synchronization
7.1 Background
•
•
•
•
•
Why?
Threads: share a logical address space
Processes: share data and codes
They have to wait in line till their turns
Race condition
P. 173-175
103
7.2 Critical-Section Problem
• Critical section: a thread has a segment of code in
which the thread may change the common data
A solution to the critical-section problem must
satisfy:
• Mutual exclusion
• Progress
• Bounded waiting
104
7.3 Two-Tasks Solutions
T0
HOW?
T1
Critical section
Alg 1: using a “turn”
What’s the problem?
What if “turn=0” and
T0 is in the non-critical
section, T1 needs to enter
the critical section?
T0
T1
Turn=0?
Turn=1?
F
T
CS
Turn=1
F
T
CS
Turn=0
Progress requirement?
105
Two-Tasks Solutions
Alg 1: using a “turn” and “yield()”
T0
T1
Turn=0?
Turn=1?
T
F
Does it need to
enter CS?
F
T
Yield()
CS
turn=1
Turn=1
What’s the problem?
F
Does it need to It does not retain sufficient
info about the state of each
enter CS?
F
T
thread (only the thread is
allowed to enter the CS).
Yield()
CS
turn=0
How to solve this problem?
Turn=0
T
106
Two-Tasks Solutions
Alg 2: using an array to replace “turn”
a0 a1->”1” indicates that T1 is ready to enter the CS
F
T0
T1
a0=1
a1=1
a1=0?
T
F
Is mutual exclusion satisfied? Yes
Is progress satisfied? No
a0=0?
T
CS
CS
a0=0
a1=0
What if both T0 and T1 set their
flag a0 and a1 to “1” at the same
time?
Loop forever!!!
107
Two-Tasks Solutions
Alg 3: satisfying the three requirements
T
T0
T0
a0=1
a1=1
Turn=1
Turn=0
a1=1&&turn=1
F
T
a0=1&&turn=0
F
CS
CS
a0=0
a1=0
108
7.4 Synchronization Hardware
• Test-and-set: indivisible instructions. If two Testand-Set instructions are executed simultaneously,
they will be executed sequentially in some
arbitrary order (flag and turn)
• Swap instruction (yield())
109
7.5 Semaphores
• A general method to handle binary or multipleparties synchronization
• Two operations: P: test and V: increment: must be
executed indivisibly
• P(S){ while S<=0; S--}
• V(S){ S++}
• Binary semaphore: 0 and 1
• Counting semaphore: resource allocation
110
Semaphores
• Busy waiting: wasting CPU resources
• Spinlock (semaphore): no context switching is
required when the process is waiting on a lock
• One solution: a process executes P operation =>
semaphore-value<0 => block itself rather than
busying waiting
• Wakeup operation: wait state => ready state
• P(S){ value--; if (value<0){ add this process to a
list; block}}
• V(S){ value++; if(value<=0){remove a process P
from list; wakeup(P);}}
111
Semaphores
• If the semaphore value is negative, the value
indicates the # of processes waiting on the
semaphore
• The waiting can be implemented by: linked list, a
FIFO queue (ensure bounded waiting), or???
• The semaphore should be treated as a critical
section:
1. Uniprocessor: inhibited interrupt
2. Multiprocessor: alg 3 (SW) or hardware
instructions
112
Semaphores
• Deadlock
• Indefinite blocking or starvation
P1
P0
p(s)
Wait for v(q) from P1 p(q)
.
.
v(s)
v(q)
Deadlock
p(q)
p(s) Wait for v(s) from P0
.
.
v(q)
v(s)
113
7.6 Classical Synchronization
Problems
• The bounded-buffer problem
• The readers-writers problem: read-write conflict in
database
• The dining-philosophers problem
Homework exercises!!!
114
7.7 Monitors
• Programming mistakes will cause malfunction of
semaphore
mutex.V();
criticalsection(); ==> several processes may be executing in their CS
mutex.P();
simultaneously!
mutex.P();
CS();
==> deadlock will occur
mutex.P();
If a process misses P(), V() or both, mutual exclusion is violated or a
deadlock will occur
115
Monitors
P
Q(suspended)
associated with
x.signal
condition x
• A monitor: a set of programmer-defined resume
operations that are provided mutual exclusion
within the monitor (the monitor construct prohibits
concurrent access to all procedures defined within
the monitor)
• Type of condition: x.wait and x.signal
• Signal-and-Wait: P=>wait Q to leave the monitor
or another condition
• Signal-and-Continue: Q=>wait P to leave the
monitor or other condition
116
7.8 Java Synchronization
Bounded buffer
• Busy wait: yield(): may cause deadlock
Assume the JVM using a
priority-based scheduling
P(high-priority)
needs buf to run
Q(low-priority)
holds buf &
can’t not be executed
due to its priority is lower
than P
=>
can not release buf
deadlock
117
Java Synchronization
• Race condition: every object in Java associated
with a single lock
• Entry set: wait for the lock to become available
• enter(), remove(), wait(), notify()
Possible deadlock:
1. Buf:full, consumer:sleeping
2. Producer:enter() => buf:full => yield() but own the lock
3. Consumer: awaken => remove() => no lock => deadlock
• Wait set: release lock and wait
• notify():
1. Pick a thread T from the wait set to the entry set
2. Sets the state of T from blocked to runnable
Read 7.8 and 7.9 : synchronization applications
118
Ch. 8 Deadlocks
8.1 System Model
• Resources: types (e.g., printers, memory),
instances (e.g., 5 printers)
• A process: must request a resource before using it
and must release it after using it (i.e., request =>
use => release)
• request/release device, open/close file,
allocate/free memory
• What cause deadlock?
119
8.2 Deadlock Characterization
• Necessary conditions:
1. Mutual exclusion
2. Hold-and-wait
3. No preemption
4. Circular wait
Resource-allocation graph
• Request edge: P->R
• Assignment edge: R->P
R1
R3
P2
P1
P3
R2
120
Deadlock Characterization
• If each resource has only one instance, then a
cycle implies that a deadlock has occurred
• If each resource has several instances, a cycle may
not imply a deadlock (a cycle is a necessary but
not a sufficient condition)
R1
P2
P1
P1->R1->P3->R2->P1
R1
P2
No deadlock, why?
R3
P3
P3
P1
P1->R1->P2->R3->P3->R2->P1
R2
P1, P2, P3 deadlock
R2
P4
121
8.3 Methods for Handling
Deadlocks
•
•
•
•
•
Deadlock prevention
Deadlock avoidance (deadlock detection)
Deadlock recovery
Do nothing: UNIX, JVM (leave to programmer)
Deadlocks occur very infrequently (once a year?).
It’s cheaper to do nothing than implement
deadlock prevention, avoidance, recovery
122
8.4 Deadlock Prevention
• Make sure the four conditions will not occur
simultaneously
• Mutual exclusion: must hold for nonsharable
resources
• Hold-and-wait: guarantee a process requests a
resource, it does not hold any other resources (low
resource utilization and may be starvation)
• No preemption:preempted resources of a process
which requests a resource but can’t get it
• Circular wait: impose a total ordering of all
resource type, and processes request resources in
an increasing order. WHY???
123
8.5 Deadlock Avoidance
• Claim edge: declare the number of resources it
may need before request them
• The OS will grant the resources to a requested
process IF there has no potential deadlock (safe
state)
R1
R1
P2
P1
Unsafe if assign
R2->P2: a cycle
P2
P1
Claim edge
R2
R2
124
8.6 Deadlock Detection
• Wait-for-graph
• Detect a cycle: O(n^2) => expensive
R1
R3
P3
P1
P2
P1
P3
P2
R2
125
8.7 Recovery from Deadlock
Process termination:
• Abort all deadlocked processes (a great expense)
• Abort one process at a time until the deadlock
cycle is eliminated
Resource preemption
• Selection of a victim
• Rollback
• Starvation
126
Ch. 9 Memory Management
9.1 Background
• Address binding: map logical address to physical
address
• Compile time
• Load time
• Execution time
FIG 9.1
127
Background
• Virtual address: logical address space
• Memory-management unit (MMU): a hardware
unit to perform run-time mapping from virtual to
physical addresses
• Relocation register -- FIG 9.2
• Dynamic loading: a routine is not loaded until it is
called (efficient memory usage)
• Static linking and dynamic linking (shared
libraries)
128
Background
• Overlays: keep in memory only the instructions
and data that are needed at any given time
• Assume 1) only 150k memory 2) pass1 and pass2
don’t need to be in the memory at the same time
1. Pass1: 70k
2. Pass2: 80k
3. Symbol table: 20k
4. Common routines: 30k
5. Overlay driver: 10k
1+2+3+4+5=210k > 150k
Overlay1: 1+3+4+5=130k; overlay2: 2+3+4+5=140k < 150k
(FIG9.3)
129
9.2 Swapping
• Swapping: memory<=>backing store (fast disks)
(FIG9.4)
• The main part of swap time is transfer time:
proportional to the amount of memory swapped
(1M ~ 200ms)
• Constraint on swapping: the process must
completely idle especially no pending IO
• Swapping is too long: standard swapping method
is used in few systems
130
9.3 Contiguous Memory
Allocation
• Memory: 2 partitions: system (OS) and users’
processes
• Memory protection:OS/processes, users processes
(FIG9.5)
• Simplest method: divide the memory into a
number of fixed-sized partitions. The OS keeps a
table indicating which parts of memory are
available and which parts are occupied
• Dynamic storage allocation: first fit (generally
fast), best fit, and worst fit
131
Contiguous Memory Allocation
• External fragmentation: statistical analysis on first
fit shows that given N blocks, 0.5N blocks will be
lost due to fragmentation (50-percent rule)
• Internal fragmentation: unused space within the
partition
• Compaction: one way to solve external
fragmentation but only possible if relocation is
dynamic (WHY?)
• Other methods: paging and segmentation
132
9.4 Paging
• Paging: permits noncontiguous local address space
of a process
• Frames: divide the physical memory into fixedsized blocks
• Pages: divide the logical memory into fixed-sized
blocks
• Address=page-number+page-offset: page-number
is an index into a page table
• The page and frame sizes are determined by
hardware.
• FIG9.6, FIG9.7, FIG9.8
133
Paging
• No external fragmentation but internal
fragmentation still exists
• To reduce internal fragmentation: small-sized page
but increase the overhead of page table entry
• What about on-the-fly page-size support?
• With page: user <=> address-translation hardware
<=> actual physical memory
• Frame table: OS needs to know the allocation
details of the physical memory (FIG9.9)
134
Paging
Structure of the page table
• Registers: fast but expensive: suitable for small
entries (256)
• Page-table-base register (PTBR): points to the
page table (which resides in the main memory):
suitable for large entries (1M) but needs two
memory access to access a byte
• Using associated registers or translation look-aside
buffers (TLBs) to speed up
• Hit ratio: effective memory-access time (FIG9.10)
135
Paging
Protection
• Protection bits: one bit to indicate a page to be
read and write or read only
• Valid-invalid bit: indicates whether the page is in
the process’s logical address space FIG9.11
• Page-table length register (PTLR) to indicate the
size of the page table: a process usually only uses
a small fraction of the address space available to it
136
Paging
Multilevel paging
• Supporting large logic address space
• The page table may be extremely large (32-bit:
page-size(4k:2^12):page-table(1M:2^20) 4bytes/page-table=>4Mbytes)
• FIG9.12, FIG9.13
• How does multilevel paging affect system
performance? 4-level paging=4 memory accesses
137
Paging
Inverted page table
• A page entry=> millions of entries => consume a
large amount of physical memory
• Inverted page table: fixed-link between entry
(index) of the page table and the physical memory
• May need to search the whole page table
sequentially
• Using hashed table to speed up this search
• FIG9.14
138
Paging
Shared pages
• Reentrant code: is non-self-modifying code, it will
never change during execution
• If the code is reentrant, it can be shares
• FIG9.15
• Inverted page tables have difficulty implementing
shared memory. WHY?
• Work for two virtual addresses that are mapped to
one physical address
139
9.5 Segmentation
• Segment: variable-sized (page: fixed-sized)
• Each segment has a name and length
• Segment table: base (starting physical address)
and limit (length of the segment)
• FIG9.17, 9.18
• Advantage:
1. association with protection(HOW?) the memorymapping hardware can check the protection-bits
associated with each segment-table entry
2. Permits the sharing of code or data (FIG9.19).
Need to search the shared segment’s number
140
Segmentation
Fragmentation
• May cause external fragmentation: when all
blocks of free memory are too small to
accommodate a segment
• What’s the suitable segment size?
• Per segment for each process <=> per segment for
per byte
141
9.6 Segmentation with Paging
• Local descriptor table (LDT): private to the
process
• Global descriptor table (GDT): shared among all
processes
• Linear address
• FIG9.20
142
Ch. 10 Virtual Memory
10.1 Background
• Virtual memory: execution of processes that may
not be completely in memory
FIG10.1
• Programs size > physical memory size
• Virtual space: programmers can assume they have
unlimited memory for their programs
• Increasing memory utilization and throughput:
many programs can be resided in memory and run
at the same time
• Less IO would be needed to swap users’ programs
into memory => run faster
• Demand paging and demand segmentation (more
complex due to varied sizes)
143
10.2 Demand Paging
• Lazy swapper: never swaps a page into memory
unless it is needed
• Valid/invalid bit: indicates whether the page is in
memory or not (FIG10.3)
• Handling a page fault (FIG10.4).
• Pure demand page: never bring a page into
memory until it is required (executed one page at a
time)
• One instruction may cause multiple page faults (1
page for instruction and several for data) : not so
bad because Locality of reference!
144
Demand Paging
• EX: three-address instruction C=A+B: 1) fetch
instruction, 2) fetch A, 3) fetch B, 4) add A, B, and
5) store to C. The worst case: 4 page-faults
• The hardware for supporting demand paging: page
table and secondary memory (disks)
• Page-fault service: 1) interrupt, 2) read the page,
and 3) restart the process
Disk
• Effective access time (EAT):
8ms: ave latency
ma: memory access time (10-200ns)
15ms: seek
p: the probability of page fault (0<=p<=1)
1ms: transfer
EAT= (1-p)*ma + p*page fault time
= 100 + 24,999,900*p (ma=100ns, page fault time = 25ms)
=> p<0.0000004 (10% degradation)=> 1ma/2,500,000 to page fault145
10.3 Page Replacement
• Over-allocating: increase the degree of
multiprogramming
• Page replacement: 1) find the desired page on
disk, 2) find a free frame -if there is one then use
it; otherwise, select a victim by applying a page
replacement algorithm, write the victim page to
the disk and update the page/frame table, 3) load
the desired page to the free frame, 4) restart the
process
• Modify (dirty) bit: reduce the overhead: if the
page is dirty (means it has been changed), in this
case we have to write this page back to the disk.
146
Page Replacement
• Need a frame-allocation and a page-replacement
algorithm: lowest page-fault rate
• Reference string: page faults vs # frames analysis
FIFO page replacement:
• Simple but not always good (FIG10.8)
• Belady’s anomaly: the page faults increase as the
increase of # of frames!!! (FIG10.9)
147
Page Replacement
Optimal page replacement:
• Replace the page that will not be used for the
longest period of time (FIG10.10)
• Has the lowest page-fault rate for a fixed number
of frames (the optimum solution)
• Difficult to implement: WHY? => need to predict
the future usage of the pages!
• Can be used as a reference point!
148
Page Replacement
LRU page replacement:
• Replace the page has not been used for the longest
period of time (FIG10.11)
• The results are usually good
• How to implement it? 1) counter and 2) stack
(FIG10.12)
• Stack algorithms (LRU) will not suffer from
Belady’s anomaly
149
Page Replacement
LRU approximation page replacement
• Reference bit: set by hardware: indicates whether
the page is referenced
• Additional-reference-bits algorithm: at regular
interval, the OS shifts the reference bit to the MSB
of a 8-bit byte (11000000 has been used more
recently than 01011111)
• Second-chance algorithm: ref-bit=1, gives it a
second chance and reset the ref-bit: uses a circular
queue to implement it (FIG10.13)
150
Page Replacement
Enhanced second-chance algorithm
• (0,0): neither recently used nor modified - best one
to replace
• (0,1): not recently used but modified - need to
write back
• (1,0): recently used but clean - probably will be
used agaain
• (1,1): recently used and modified
• We may have to scan the circular queue several
times before we can find the page to be replaced
151
Page Replacement
Counting-based page replacement
• the least frequently used (LFU) page-replacement
algorithm
• the most frequently used (MFU) page-replacement
algorithm
Page-buffering algorithm
• Keep a pool of free frame: we can write the page
into a free frame before we need to write a page
out of the frame
152
10.4 Allocation of Frames
• How many free frames should each process get?
Minimum number of frames
• It depends on the instruction-set architecture: we
must have enough frames to hold all the pages that
any single instruction can reference
• It also depends on the computer architecture: ex.
PDP11 some instructions have more than 1 word
(it may straddle 2 pages) in which 2 operands may
be indirect reference (4 pages) => needs 6 frames
• Indirect address may cause problem (we can limit
the levels of indirection e.g., 16)
153
Allocation of Frames
Allocation Algorithms
• Equal allocation
• Proportional allocation: allocating memory to each
process according to its size
• Global allocation: allow high-priority processes to
select frames from low-priority processes
(problem? A process can not control its own pagefault rate)
• Local allocation: each process selects from its own
set of frames
• Which one is better? Global allocation: high
throughput
154
10.5 Trashing
• Trashing: high paging activity (a severe
performance problem)
• A process is trashing if it is spending more time
paging than executing
• The CPU scheduler: decreasing CPU utilization
=> increases the degree of multiprogramming =>
more page faults => getting worse and worse
(FIG10.14)
• Preventing trashing: we must provide a process as
many frames as it needs
• Locality: process executes from locality to locality
155
Trashing
• Suppose we allocate enough frames to a process to
accommodate its current locality. It will not fault
until it changes its localities
Working-set model => locality
• Working-set: the most active-used pages within
the working-set window (period) (FIG10.16)
• The accuracy of the working set depends on the
selection of the working-set window (too small:
will not encompass the entire locality; too large:
overlay several localities)
156
Trashing
WSS_i: the working-set size
D: the total demand for frames
D = sum WSS_i
• If the total demand is greater than the total number
of available frames (D>m), trashing will occur
• If D>m allocate frames to processes else the OS
suspends some processes
• Difficulty: how to keep tracking of the workingset (it’s a moving window)
157
Trashing
• Page-fault frequency (PFF): establishing lowerand upper-bound on the desired page-fault rate
(FIG10.17)
• Below lower-bound: the process may have too
many frames (remove from it)
• Over upper-bound: the process may not have
enough frames (add to it)
158
10.6 OS Examples
• NT: demand pages with clustering, working-set
minimum/maximum, automatic working-set
trimming
• Solaris 2: minfree/lotsfree, pageout starts when
reaches minfree, two-handed-clock algorithm
159
10.7 Other Considerations
Prepaging
• bring into memory at one time all the pages that
will be needed
s pages, alpha: the actual fraction of s will be used
(0<=alpha<=1)
• if alpha->0 prepaging loses, if alpha->1 prepaging
wins
Page size
• How to determine page size?
• Size of the page table: large page size => small
page table
160
Other Considerations
• Smaller page size => smaller internal
fragmentation => better memory utilization
• Page read/write time: large page size to minimize
IO time
• Smaller page size => better locality (resolution)
=> less IO time
• The historical trend is toward large page sizes:
WHY? High CPU/memory speed compared to
disk speed but more internal fragmentation
161
Other Considerations
• Inverted page table: reduce the virtual-to-physical
translation time
• Program structure: increase locality => lower
page-fault rate (e.g., stack is good, hashing is not
good)
• Compiler and loader also affect on paging:
separating code (it will never be modified) and
data
• Frequent use of pointers (C, C++) tends to
randomize memory access: not good; Java is
better : no pointers
162
Other Considerations
IO interlock
• Allow some of the pages to be locked in memory
(for IO operations)
• It may be dangerous because it may get turned on
but never turned off
163
Ch. 11 File Systems
11.1 File Concept
• File: a named collection of related information
that is recorded on secondary storage
• Types: text (source), object (executable) files
• Attributes: name, type, location, size, protection,
time/date/user id
• Operations: creating, writing, reading,
repositioning, deleting, truncating (delete the
content only), appending,a renaming, copy
• Info associated with an open file: file pointer, file
open count, disk location of the file
164
File Concept
• Memory mapping: multiple processes may be
allowed to map the same files into the virtual
memory of each, to allow sharing of data
(FIG11.1)
• File types: name => a name + an extension
• File structure: more structures, the OS needs to
support more => support minimum number of file
structures (UNIX, MS-DOS 8-bit byte no
interpretation): each application program needs to
provide its own code to interpret an input file to
the appropriate structure
165
File Concept
• Internal file structure: packing a number of logical
records into physical blocks (all file systems suffer
internal fragmentation problem)
• Consistency semantics: modifications of data by
one user should be observed by other users
• UNIX: writes to an open file by one user are
visible immediately by other users; sharing the
pointer of current location into the file
166
11.2 Access Methods
• Sequential access (order)
• Direct access (no order, random): relative block
number - an index relative to the beginning of the
file
• Index file: contains pointers to various blocks
• Index-index file (if index is too big)
167
11.3 Directory Structure
• The file system is divided into partitions
(IBM:minidisks, PC/Macintosh: volumes)
• Partitions can be thought as virtual disks
• Device directory or volume table of contents
(FIG11.6)
• Operations on directory: search a file, create a file,
delete a file, list a directory, rename a file, traverse
the file system
168
Directory Structure
Single-level directory
• Simple but not good when too many files or too
many users
• All files in the same directory that need unique
names
Two-level directory
• User has his/her own user file directory (UFD)
under the system’s master file directory (MFD)
• Solving name-collision problem but isolating user
from each other (not good for cooperation)
• Path name, search path
169
Directory Structure
Tree-structured directory
• A directory contains a set of files or subdirectories
• Each user has a current directory
• Path names: absolute-begins at the root
(root/spell/mail); relative-from the current
directory (prt/first => root/spell/mail/prt/first)
• How to delete a directory? : directory must be
empty: what if the directory contains several
subdirectories?
• UNIX “rm -r”: remove all but dangeous
• Users can access their and all others’ files
170
Directory Structure
Acyclic-graph directories
• Can a tree structure share files and directories?
NO
• A graph with no cycles, allows directories to have
shared directories and files (FIG11.10)
• UNIX: link: a pointer to another file or
subdirectory
• Symbolic link: a link is implemented as an
absolute or relative path name
• Duplicate all info in both sharing directories:
consistency issue
171
Directory Structure
• Concerns for implementing an acyclic-graph
directory
1. A file may have multiple absolute path name
2. When can the space allocated to a shared file can
be deleted and reused?
• It is easy to handle them by using symbolic link
• Need a mechanism to indicate whether all the
references to it are deleted
1. File-reference list: potential large size
2. Counter: 0=no more reference: cheaper (UNIX:
hard link)
172
Directory Structure
• Acyclic-graph is complicate, some systems do not
allow shared directories or links (MS-DOS: tree)
General graph directory
• Problem with acyclic graph is difficult to ensure
that there are no cycles? WHY?
• Adding links to a tree-structured directory
becomes general graph directory (FIG11.11)
• Cycle problem: one solution is to limit the depth
(number of directories) of search
• Garbage collection: 1st pass identify
files/directories can be freed, 2nd pass free the
space: time consuming
173
11.4 Protection
• Reliability: guarding against physical damage
(duplicate copies of files)
• Protection: guarding against improper access
• Controlled access: read, write, execute, append,
delete, list
• Access list and group: owner, group, universe
(UNIX: rwx)
• Other protection approaches: one password for
every file (who can remember so many
passwords?), one password for a set of files
174
11.5 File-System Structure
• To improve IO efficiency, blocks transfer between
disk and memory
• File-systems: allow the data to be stored, located,
and retrieved easily.
• Two issues: 1) how to look to a user and 2) must
develop algorithms and data structures to
implement it
• A layer design: application programs => logical
file system (directories) => file-organization
module => basic file system => IO control =>
devices
175
File-System Structure
• Open-file table (FIG11.14): files must be opened
before they can be used for IO procedures
• File descriptor, file handle (NT), file control block
• Mounted: the file system must be mounted before
it can be available to processes on the system
(/home/jane = /user/jane)
176
11.6 Allocation Methods
• Contiguous, linked, and indexed
Contiguous allocation
• Each file to occupy a set of contiguous blocks on
the disk (FIG11.15)
• The number of disk seeks is minimum
• It’s easy to access a file which is allocated to a set
of contiguous blocks
• Support both sequential and direct access
• Difficult to find space for a new file (dynamic
storage-allocation algorithm: best-fit, first-fit,
worst-fit)
177
File-System Structure
• Cause external fragmentation
• Run a repacking routine: disk->memory->disk:
effective but time consuming (need down time)
• How much space is needed for a file?
1. Too little: the file can not be extended
2. Too much: waste
• Re-allocation: find a large hold, and copy the
contents, and repeat the process: slow!!!
• Preallocation and then extent (with a link)
178
File-System Structure
Linked allocation
• Each block contains a pointer to the next block
(FIG11.16)
• No external fragmentation: no need to compact
space
• Disadvantages:
1 can be used effectively only to sequential-access
files and inefficient to support direct-access files
2. Overhead for the pointers
179
File-System Structure
• One solution to the problems: collect blocks into
clusters
• Reliability is another problem: what if a link is
missing?
• File allocation table (FAT): MS-DOC and OS/2:
FIG11.17
• FAT allocation scheme may need a significant
number of disk head seek, unless FAT is cached
180
File-System Structure
Indexed allocation
• Index block: bringing all the pointer into one place
• Each file has its own index block that contains a
array of disk-block addresses (FIG11.18)
• Support direct access, no external fragmentation
• Overhead of index block > pointers of linked
allocation
• How large the index block should be?
1. Linked scheme
2. Multilevel index
3. Combined scheme (FIG11.19)
181
File-System Structure
Performance
• Contiguous allocation: needs one access to get a
disk block
• Linked allocation: needs i accesses to get the ith
block: no good for direct access applications
• Direct-access files using contiguous allocation,
sequential-access using linked allocation
• Indexed allocation: depends on the index
structure, file sizes, etc. (What if the index block is
too big which can not stay in the memory all time?
Swap-in and swap out the index block???)
182
11.7 Free-Space Management
• Free-space list
1. Bit vector: each bit represents one block: simple:
but only effective if entire bit-vector stays in
memory (1.3G/512-block/bit-vector=332k)
2. Linked list: traversal needs substantial IO time but
luckily it is not a frequent action
3. Grouping: store the addresses of n-free blocks in
the first block (n-1 actual free blocks): a large
number of free blocks can be found quickly
4. Counting: instead keep all the addresses, we only
need to store the address of 1st block and count n
183
11.8 Directory Implementation
• Efficiency, reliability, performance <=> the
selection of directory-selection and management
algorithms
• Linked list: simple but time consuming to execute
1. Create, delete, reuse a directory?
2. Linear search to find a file
3. Cache or sorted list (binary search) to improve it
• Hash table: collision problem (chained-overflow
hash table)
184
11.9 Efficiency and Performance
• Major system bottleneck: disk
• Efficiency: algorithms used
• Performance: cache, free-behind, read-ahead,
virtual disk/RAM disk.
185
11.10 Recovery
• Consistency checking: comparing the data in the
directory structure with the data blocks on disk
• The loss of a directory entry on an indexed
allocation system could be disastrous
• Backup and restore
186
Ch. 12 IO Systems
12.1 Overview
• IO devices vary widely in their function and
speed. Hence we need a variety of methods to
control them
• Two conflicting trends: 1) increasing
standardization of hardware and software interface
and 2) increasing broad variety of IO devices
• Device-driver modules: kernel, encapsulation
187
12.2 IO Hardware
• Many types of devices: storage devices (sidks,
tapes), transmission devices (modems, network
cards), human interface devices (mouse, screen,
keyboard)
• Port, bus, daisy chain (a serial connected devices)
• Controller: a serial-port controller
• Host adapter: contains processor, microcode,
memory for complex protocols (SCSI)
• PC bus structure: FIG12.1
188
IO Hardware
• How can the processor give commands and data to
a controller to accomplish an IO transfer?
• Special IO instructions
• Memory-mapped IO: faster if need to transfer a
large amount of data (screen display).
Disadvantage? Software fault!
• IO port (4 registers): status, control, data-in and
data-out
• Some controllers use FIFO
189
IO Hardware
Polling
• Handshaking: in the first step, the host repeatedly
monitors the busy bit: busy-waiting or polling (go
to door every 1 minute to check out whether
someone is on the door)
Interrupts
• Interrupt-request line: someone ring a door bell to
indicate he/she is on the door
• Interrupt-driven IO cycle: FIG12.3
190
IO Hardware
• Interrupt-request lines: nonmaskable interruptreserved for events such as nonrecoverable errors;
maskable interrupt- can be turned off by CPU
• Interrupt vector: the memory addresses of
specialized interrupt handlers
• Interrupt priority levels
• Interrupt in a OS: at boot time, during IO,
exceptions
• Other usage of interrupts: page-fault (virtual
memory), system calls (software interrupt or trap),
manage the control flow (yield some low-priority
job to high-priority one)
191
IO Hardware
• A threaded kernel architecture is well-suite to
implement multiple interrupt priority and to
enforce the precedence of interrupt handling over
background processing in kernel and application
routines
Direct memory access (FIG12.5)
• Programmed IO (PIO): 1 byte transfer at a time
• DMA controller operates on memory bus directly
• DMA seizes the memory bus => the CPU can not
access the memory => it still can access data in the
primary and secondary cache => CYCLE
STEALING
192
12.3 Application IO Interface
• A kernel IO structure (FIG12.6): encapulation
• Devices’ characteristics (FIG12.7)
1. Character stream or block
2. Sequential or random-access
3. Synchronous or asynchronous
4. Sharable (can be used by concurrent threads) or
dedicated
5. Speed of operation
6. Read/write, read only or write only
• Escape or back-door system call (UNIX ioctl)
193
Application IO Interface
Block and character devices
• Block-device: read, write, seek (memory-mapped
file access can be layered on top of block-device
drivers
• Character-stream: keyboard, mouse, modems
Network devices
• Network socket interface (UNIX, NT)
Clocks and timers
• Give the current time, the elapsed time, set a timer
to trigger operation X at time T (programmable
interval timer)
194
Application IO Interface
Blocking and nonblocking IO
• Blocking IO system call: the execution of the
application is suspended (run->wait queue)
• Nonblocking IO: ex. Interface with mouse and
keyboard while processing and display data on the
screen:
1. One solution: overlap execution with IO using a
multithreaded application
2. Asynchronous system call: returns immediately
without waiting for the IO to complete
195
12.4 Kernel IO Subsystem
IO scheduling
• To improve the overall performance
Buffering
• Cope with speed mismatching
• Cope with different data-transfer sizes
• Support copy semantics for application IO
• Double buffering: write to one and read from
another one
196
Kernel IO Subsystem
Caching
• Using a fast memory to hold copies of data
Spooling and device reservation
• A spool: is a buffer that holds output for a device
such as a printer (serve one job at a time)
• Each application’s output is spooled to a separate
disk file.
Error handling
• An OS uses protected memory
• Sense key (SCSI protocol to identify failures)
197
Kernel IO Subsystem
Kernel data structures
• UNIX IO kernel structure (FIG12.9)
• The IO subsystem supervises:
1. Management of the name space for files/devices
2. Access control to files/devices
3. Operation control
4. File system space allocation
5. Device allocation
6. Buffering, caching, and spooling
7. IO scheduling
8. Device status monitoring, error handling and failure recovery
9. Device driver configuration and initialization
198
12.5 IO Requests Handling
• How’s the connection made from the file name to
the disk controller?
• Device table (MS-DOS): “c:”
• Mount table (UNIX):
• Stream (UNIX V): a full-duplex connection
between a device driver and a user-level process
• The life cycle of an IO request (FIG12.10)
199
12.6 Performance
• IO is a major factor in system performance
• Context switching is a main factor
• Interrupt is relatively expensive: state change =>
execute the interrupt handler => restore the state
• Network traffic can also cause a high contextswitch rate (FIG12.11)
• Telnet daemon (Solaris): using in-kernel threads to
eliminate the context switches
• Front-end processor, terminal concentrator
(multiplexing many remote terminals to one port),
IO channel
200
Performance
Several principles to improve efficiency of IO:
• Reduce # of context switches
• Reduce # of times that data needs to be copied to
memory while they are passing between device
and application
• Reduce the frequency of interrupts
• Increasing usage of DMA
• Move processing primitives into hardware
(allowing concurrent CPU and bus operations)
• Balance load between CPU, memory subsystem,
and IO
201
Performance
• Device-functionality progression (FIG12.12)
• Where should the IO functionality be
implemented???
• Application-level: easy, flexible, inefficient (high
context switches)
• Kernel level: difficulty but efficient
• Hardware level: inflexible, expensive
202
Ch. 13 Mass-Storage Structure
13.1 Disk Structure
• Magnetic tape (slower than disk): backup use
• Converting logical block to disk: two problems: 1)
most disks have some defects and 2) the # of
sectors per track is not a constant
• Cylinder, track, sector
203
13.2 Disk Scheduling
• Seek time: the time for the disk arm to move the
heads to the cylinder containing the desired sector
• Rotational latency: the time waiting for the disk to
rotate the desired sector to the head
• Bandwidth: the total # bytes transferred, divided
by the total time from request to completion
FCFS scheduling (first-come-first-serve)
• Simple but not generally provide fastest service
(FIG13.1)
204
Disk Scheduling
SSTF scheduling (shortest-seek-time-first)
• Selects the request with the minimum seek time
from the current head position (FIG13.2)
• It may cause starvation of some requests
• It is much better than FCFS but not optimal
SCAN scheduling
• The head continuously scans back and forth across
the disk (FIG13.3): also called elevator algorithm
C-SCAN scheduling
• Reach one end immediately return to the other end
(FIG13.4)
205
Disk Scheduling
LOOK scheduling
• Reach the final request and return (without going
to the end of the disk) (FIG13.5)
Selection of a disk-scheduling algorithm
• Heavy load: SCAN and C-SCAN perform better
due to no starvation problem
• Final a optimal scheduling? Computational
expensive may not justify the saving over SSTF or
SCAN
• Request performance may affect by the fileallocation method (continuous vs linked/indexed
files)
206
Disk Scheduling
• Request performance is also affect by the location
of directories and index blocks (e.g., the index
block is on the 1st cylinder and the data is on the
last one)
• Caching the directories and index blocks in main
memory will help
• OS should have a module that includes a set of
scheduling algorithms for different applications
• What iF the rotational latency is nearly as large as
the average seek time?
207
13.3 Disk Management
• Disk formatting: low-level formatting: physical
formatting: divide the disk into sectors that the
disk controller can read and write from/to it
• Error-correcting code (ECC): write data => the
controller computes the ECC => read data =>
compute the ECC => if match ok => if not error
and correct it
• Using a disk to hold file: needs to record its own
data structures on the disk
1. Partition the disk into several groups of cylinders
2. Logical formatting (making a file system)
208
Disk Management
• Boot block (stored in ROM), bootstrap loader,
boot disk (system disk)
• Bad blocks: why disks are easily defected?
(moving part)
• What if format finds a bad block? Making the FAT
entry and
1. Sector sparing (forwarding): OS preserves some
spare sectors to replace bad blocks
2. Sector slipping: 17 is bad, 17-100 move to 18-101
• Can the replacement be fully automatic? No. users
may need to fix the lost data manually
209
13.4 Swap-Space Management
• Main goal: provide the best throughput for the
virtual-memory system
• Swap-space: range from a few megabytes to
hundreds of megabytes: is safer to overestimate
than to underestimate the swap space. WHY?
• Swap-space location: in the file systems (easy but
inefficient) or in a separate disk partition (efficient
but internal fragmentation may increase)
• In 4.3BSD, swap space is allocated to a process
when it is started: text segment (FIG13.7) and
data segment (FIG13.8) = swap map
210
13.5 Reliability
• Disks used to be the least reliable component of a
system (disk crash)
• Disk stripping (interleaving): uses a group of disks
as one storage unit: improve performance and
reliability
• Redundant array of independent disks (RAID)
• Mirroring or shadowing: keeps a duplicate copy of
each disk
• Block interleaved parity: a small fraction of the
disk space is used to hold parity blocks
• What are the overheads?
211
13.6 Stable-Storage
Implementation
• Information resided in the stable storage is never
lost
• Whenever a failure occurs when writing of a
block, the system will recovery and restore the
block
1. Write info to the first physical block
2. If it completed successfully, write the same info to
the second physical block
3. Declare the operation complete only after the
second write completes successfully
• Data will be safe unless all copies are destroyed
212
13.7 Tertiary-Storage Structure
•
•
•
•
•
•
Removable media
Removable disks
Tapes
What are the considerations of the OS?
Application interface
File naming
213
Ch. 14 Network Structures
14.1 Background
• Distributed systems: remote and local resources:
server and client (user)
Advantages:
• Resource sharing
• Computation speedup: load sharing
• Reliability
• Communication: mainframe downsizing
Network operating systems
• Remote login (telnet)
• Remote file transfer (FTP)
214
Background
Distributed operating systems
• Data migration: transfer the entire file or transfer
portions of file necessary for the immediate task
• Computational migration: remote procedure call
(RPC)
• Process migration: load balancing, computation
speedup, hardware preference, software
preference, data access
215
14.2 Network Types
Local-area networks (LAN)
• Twisted pair and fiber optics cable
• Multicasting, ring, star
Wide-area networks (WAN)
• Arpanet
• Communication processors (FIG14.3)
• Modems
216
14.3 Communication
• Naming and name resolution: domain name
service (DNS): cs.nthu.edu.tw : tw->domain
• Routing strategies: fixed, virtual, dynamic routing:
gateway
• Packet strategies: circuit, message, packet
switching
• Contention: carrier sense with multiple access
CSMA/CD, token passing, message slots
217
14.4 Communication Protocols
•
•
•
•
•
•
•
•
Physical layer
Link layer
Network layer
Transport layer
Session layer
Presentation layer
Application layer
FIG14.4,14.5
218
14.5 Robustness
14.6 Design Issues
• Failure detection
• Reconfiguration
• Recovery from failure
•
•
•
•
Transparent to users
User mobility
Scalability
Fault-tolerant systems
219
Ch. 15 Distributed
Communication
• Socket: an endpoint for communication: a clientserver architecture (FIG15.1)
• Servers need to provide multithreads to serve
many incoming requests
• Java: connection-oriented (TCP) and
connectionless (UDP) sockets
• Remote procedure call (RPC): sockets provide
only communication of an unstructured stream of
bytes
• Remote method invocation (RMI): Java virtual
machine (JVM): a thread invokes a method on a
remote object
220
Ch. 16 Distributed Coordination
16.1 Event Ordering
• The happen-before relation: dependency
(FIG16.1): a->b and b->c => a->c: timestamps
16.2 Mutual exclusion
• Centralized approach: one of the processors
control the accesses to critical sections (request,
reply, release)
• Fully distributed approach: very complicate:
HOW?
• Token-passing approch
221
16.3 Deadlock Handling
Deadlock prevention
• The wait-die scheme
• The wound-wait scheme
Deadlock detection
• Local/global wait-for graphs (FIG16.2, 16.3)
• Centralized approach: deadlock-detection
coordinator
• Fully distributed approach
Election algorithms: the bully, ring algorithm
• Coordinator fails (site failure) => restarting a new
copy on a new site
222
Ch. 17 Distributed File Systems
17.1 Background
• A DFS is a file system whose clients, servers, and
storage devices are dispersed among the machines
of a distributed system
• The performance measurement is the amount of
time needed to satisfy various service requests
17.2 Naming and transparency
• Location transparency and location independence
• Diskless clients
• Naming schemes: 1) combining host and local
name, 2) automount (SUN), 3) a global name
structure
223
17.3 Remote File Access
• Caching: cache-consistency problem
• Cache location (main memory or disk): pro & con
• Cache-update policy: write-through, delayedwrite, write-on-close
• Consistency: client-initiated and server-initiated
approaches
• Stateful vs stateless service
• A NFS example (FIG 17.3)
224
Ch. 18 Protection
18.1 Goals of protection
• Mechanism: how something will be done
• Policies: what will be done
18.2 Domain of protection
• Need-to-know principle
• Protection domain and access right
• Each user, process,procedure may be a domain
• UNIX: an owner id and a domain bit (setuid bit)
• MULTICS: a ring structure, 7 layer, j<i then i is a
subset of j
225
Protection
18.3 Access matrix (FIG 18.3-7)
18.4 Implementation of access matrix
• Global table
• Access lists for objects
• Capability lists for domains
• A lock-key mechanism
18.5 Revocation of access rights
• Immediate vs delayed
• Selective vs general
• Partial vs total
• Temporary vs permananent
226
Protection
18.6 Language-based protection
• Using programming language to specify the
disired control of access to a shared resource in a
system is making a declarative statement about the
resources
227
Ch. 19 Security
19.1 The security problem
• Malicious or accidental
• Physical level and human level
19.2 Authentication
• Passwords (password vulnerability)
• Encrypted passwords
• One-time passwords
228
Security
• 19.3 Program threats
• Trojan horse: systems allow programs written by
users to be executed by other user (has no access
right on some domains in which the program is
executing on)
• Trap door: a program or system might leave a hole
in the software that only the writer can use
19.4 System Threads
• Worms and viruses
19.5 Threat monitoring
19.6 Encryption
229